[ 
https://issues.apache.org/jira/browse/MAHOUT-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13945154#comment-13945154
 ] 

Reinis Vicups edited comment on MAHOUT-1486 at 3/24/14 2:24 PM:
----------------------------------------------------------------

This gives NPE in combination with -rskm option (mahout 0.8 I was pointed by Mr 
Marthi to use mahout 0.9 so likely this is not relevant for 0.9 anymore). I am 
aware that --distanceMeasure is currently being ignored, am just posting 
original command that caused NPE:

{code}
mahout streamingkmeans -i /output/tfidf-vectors -o /ticket-text-clusters/output 
-k 230 -km 900 -rskm -ow --distanceMeasure 
org.apache.mahout.common.distance.ChebyshevDistanceMeasure
{code}

About number of points. Am not sure how to determine this - I did seqdumper 
with -c on tfidf-vectors and get this:

{code}
Count: 328485
{code}


was (Author: reinis_v):
This gives NPE in combination with -rskm option (mahout 0.8 I was pointed by Mr 
Marthi to use mahout 0.9 so likely this is not relevant for 0.9 anymore):

{code}
mahout streamingkmeans -i /output/tfidf-vectors -o /ticket-text-clusters/output 
-k 230 -km 900 -rskm -ow --distanceMeasure 
org.apache.mahout.common.distance.ChebyshevDistanceMeasure
{code}

About number of points. Am not sure how to determine this - I did seqdumper 
with -c on tfidf-vectors and get this:

{code}
Count: 328485
{code}

> Streaming KMeans NPE
> --------------------
>
>                 Key: MAHOUT-1486
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1486
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.8
>            Reporter: Reinis Vicups
>            Assignee: Suneel Marthi
>             Fix For: 1.0
>
>
> I am assuming that this occurs because of  --reduceStreamingKMeans (-rskm) 
> option set. Will try and test it without reduce and report if the NPE goes 
> away.
> Error: java.lang.NullPointerException
>         at 
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:187)
>         at 
> org.apache.mahout.math.random.WeightedThing.<init>(WeightedThing.java:31)
>         at 
> org.apache.mahout.math.neighborhood.BruteSearch.searchFirst(BruteSearch.java:127)
>         at 
> org.apache.mahout.clustering.ClusteringUtils.estimateDistanceCutoff(ClusteringUtils.java:116)
>         at 
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansThread.call(StreamingKMeansThread.java:63)
>         at 
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:55)
>         at 
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:35)
>         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
>         at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to