[
https://issues.apache.org/jira/browse/MAHOUT-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13945154#comment-13945154
]
Reinis Vicups edited comment on MAHOUT-1486 at 3/24/14 2:24 PM:
----------------------------------------------------------------
This gives NPE in combination with -rskm option (mahout 0.8 I was pointed by Mr
Marthi to use mahout 0.9 so likely this is not relevant for 0.9 anymore). I am
aware that --distanceMeasure is currently being ignored, am just posting
original command that caused NPE:
{code}
mahout streamingkmeans -i /output/tfidf-vectors -o /ticket-text-clusters/output
-k 230 -km 900 -rskm -ow --distanceMeasure
org.apache.mahout.common.distance.ChebyshevDistanceMeasure
{code}
About number of points. Am not sure how to determine this - I did seqdumper
with -c on tfidf-vectors and get this:
{code}
Count: 328485
{code}
was (Author: reinis_v):
This gives NPE in combination with -rskm option (mahout 0.8 I was pointed by Mr
Marthi to use mahout 0.9 so likely this is not relevant for 0.9 anymore):
{code}
mahout streamingkmeans -i /output/tfidf-vectors -o /ticket-text-clusters/output
-k 230 -km 900 -rskm -ow --distanceMeasure
org.apache.mahout.common.distance.ChebyshevDistanceMeasure
{code}
About number of points. Am not sure how to determine this - I did seqdumper
with -c on tfidf-vectors and get this:
{code}
Count: 328485
{code}
> Streaming KMeans NPE
> --------------------
>
> Key: MAHOUT-1486
> URL: https://issues.apache.org/jira/browse/MAHOUT-1486
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.8
> Reporter: Reinis Vicups
> Assignee: Suneel Marthi
> Fix For: 1.0
>
>
> I am assuming that this occurs because of --reduceStreamingKMeans (-rskm)
> option set. Will try and test it without reduce and report if the NPE goes
> away.
> Error: java.lang.NullPointerException
> at
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:187)
> at
> org.apache.mahout.math.random.WeightedThing.<init>(WeightedThing.java:31)
> at
> org.apache.mahout.math.neighborhood.BruteSearch.searchFirst(BruteSearch.java:127)
> at
> org.apache.mahout.clustering.ClusteringUtils.estimateDistanceCutoff(ClusteringUtils.java:116)
> at
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansThread.call(StreamingKMeansThread.java:63)
> at
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:55)
> at
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:35)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
> at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160)
--
This message was sent by Atlassian JIRA
(v6.2#6252)