We need to refactor some of the T-Digest stuff, the T-Digest that's been plumbed into Mahout 0.9 was a very early version that only had a TreeDigest.
We now have AVLTreeDigest and better methods available. Good idea to create a JIRA for that and redo T-Digest stuff in Mahout. On Fri, Apr 3, 2015 at 8:25 PM, Andrew Palumbo <[email protected]> wrote: > Sorry its TreeDigest that will be missing not OnlineSummarizer which is > in org.apache.mahout.math.stats and imports: > > com.tdunning.math.stats.TDigest > com.tdunning.math.stats.TreeDigest > > > > > On 04/03/2015 08:17 PM, Andrew Palumbo wrote: > >> The problem is (if I understand correctly) that Srtream-lib has >> TDijgest.java but not the rest of the classes in the t-digest artifact eg: >> OnlineSummarizer which is used by the ResultAnalyzer class that I ported >> over from MrLegacy. to Math-Scala in the confusion matrix (also ported to >> math-scala). >> >> https://github.com/addthis/stream-lib/tree/master/src/ >> main/java/com/clearspring/analytics/stream/quantile >> >> I've added: >> >> <include>com.tdunning:t-digest</include> >> <include>org.apache.commons:commons-math3</include> >> >> to the spark/src/main/assembly/dependency-reduced.xml >> >> to include these jars in the spark-naive bayes CLI launcher. >> >> i believe that the dependency-reduced jar was slimmed down from the >> entire MrLegacy module, which included the t-digest artifact. to the few >> dependencies that we have in it now. >> >> Not including these in the dependency-reduced jar leads to this exception: >> >> Exception in thread"main" java.lang.NoClassDefFoundError: >> com/tdunning/math/stats/TDigest >> at org.apache.mahout.classifier.stats.ResultAnalyzer.<init>( >> ClassifierStats.scala:64) >> at org.apache.mahout.classifier.naivebayes.NaiveBayes$class. >> test(NaiveBayes.scala:303) >> at org.apache.mahout.classifier.naivebayes.NaiveBayes$.test( >> NaiveBayes.scala:336) >> at org.apache.mahout.drivers.TestNBDriver$.process( >> TestNBDriver.scala:105) >> at org.apache.mahout.drivers.TestNBDriver$$anonfun$main$1. >> apply(TestNBDriver.scala:77) >> at org.apache.mahout.drivers.TestNBDriver$$anonfun$main$1. >> apply(TestNBDriver.scala:75) >> at scala.Option.map(Option.scala:145) >> >> >> I'm not sure that this is causing the exception below, but it does seem >> possible. >> >> On 04/03/2015 07:26 PM, Suneel Marthi wrote: >> >>> You shouldn't be adding T-Digest again to Spark modules (since Stream-lib >>> in Spark already has one). >>> >>> T-Digest is needed for MrLegacy and should be added as a dependency. >>> >>> On Fri, Apr 3, 2015 at 7:00 PM, Andrew Palumbo <[email protected]> >>> wrote: >>> >>> I'm wondering if it could be caused by the TDigest class from the >>>> artifact that I added to the dependency-reduced jar conflicting with the >>>> spark TDigest class which, as you pointed out the other day, is on the >>>> spark classpath. The exception is coming right when the summarizer is >>>> being used by the confusion matrix. >>>> >>>> >>>> On 04/03/2015 06:22 PM, Dmitriy Lyubimov wrote: >>>> >>>> saw a lot of these, some still bewildering, but they all related to >>>>> non-local mode (different classpaths on backed and front end). >>>>> >>>>> >>>>> >>>>> On Fri, Apr 3, 2015 at 1:39 PM, Andrew Palumbo <[email protected]> >>>>> wrote: >>>>> >>>>> Has anybody seen an exception like this when running a spark job? >>>>> >>>>>> the job completes but this exception is reported in the middle. >>>>>> >>>>>> 15/04/02 12:43:54 ERROR Remoting: org.apache.spark.storage. >>>>>> BlockManagerId; >>>>>> local class incompatible: stream classdesc serialVersionUID = >>>>>> 2439208141545036836, local class serialVersionUID = >>>>>> -7366074099953117729 >>>>>> java.io.InvalidClassException: org.apache.spark.storage. >>>>>> BlockManagerId; >>>>>> local class incompatible: stream classdesc serialVersionUID = >>>>>> 2439208141545036836, local class serialVersionUID = >>>>>> -7366074099953117729 >>>>>> at java.io.ObjectStreamClass.initNonProxy( >>>>>> ObjectStreamClass.java:617) >>>>>> at java.io.ObjectInputStream.readNonProxyDesc( >>>>>> ObjectInputStream.java:1622) >>>>>> at java.io.ObjectInputStream.readClassDesc( >>>>>> ObjectInputStream.java:1517) >>>>>> at java.io.ObjectInputStream.readOrdinaryObject( >>>>>> ObjectInputStream.java:1771) >>>>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream. >>>>>> java:1350) >>>>>> at java.io.ObjectInputStream.defaultReadFields( >>>>>> ObjectInputStream.java:1990) >>>>>> at java.io.ObjectInputStream.readSerialData( >>>>>> ObjectInputStream.java:1915) >>>>>> at java.io.ObjectInputStream.readOrdinaryObject( >>>>>> ObjectInputStream.java:1798) >>>>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream. >>>>>> java:1350) >>>>>> at java.io.ObjectInputStream.readObject(ObjectInputStream. >>>>>> java:370) >>>>>> at akka.serialization.JavaSerializer$$anonfun$1. >>>>>> apply(Serializer.scala:136) >>>>>> at scala.util.DynamicVariable.withValue(DynamicVariable. >>>>>> scala:57) >>>>>> at akka.serialization.JavaSerializer.fromBinary( >>>>>> Serializer.scala:136) >>>>>> at akka.serialization.Serialization$$anonfun$ >>>>>> deserialize$1.apply( >>>>>> Serialization.scala:104) >>>>>> at scala.util.Try$.apply(Try.scala:161) >>>>>> at akka.serialization.Serialization.deserialize( >>>>>> Serialization.scala:98) >>>>>> at akka.remote.MessageSerializer$.deserialize( >>>>>> MessageSerializer.scala:23) >>>>>> at akka.remote.DefaultMessageDispatcher. >>>>>> payload$lzycompute$1(Endpoint. >>>>>> scala:55) >>>>>> at akka.remote.DefaultMessageDispatcher. >>>>>> payload$1(Endpoint.scala:55) >>>>>> at akka.remote.DefaultMessageDispatcher. >>>>>> dispatch(Endpoint.scala:73) >>>>>> at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse( >>>>>> Endpoint.scala:764) >>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) >>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:456) >>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) >>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:219) >>>>>> at akka.dispatch.ForkJoinExecutorConfigurator$ >>>>>> AkkaForkJoinTask.exec( >>>>>> AbstractDispatcher.scala:386) >>>>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec( >>>>>> ForkJoinTask.java:260) >>>>>> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue. >>>>>> runTask(ForkJoinPool.java:1339) >>>>>> at scala.concurrent.forkjoin.ForkJoinPool.runWorker( >>>>>> ForkJoinPool.java:1979) >>>>>> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run( >>>>>> ForkJoinWorkerThread.java:107) >>>>>> >>>>>> >>>>>> >> >> >
