We need to refactor some of the T-Digest stuff, the T-Digest that's been
plumbed into Mahout 0.9 was a very early version that only had a TreeDigest.

We now have AVLTreeDigest and better methods available. Good idea to create
a JIRA for that and redo T-Digest stuff in Mahout.


On Fri, Apr 3, 2015 at 8:25 PM, Andrew Palumbo <[email protected]> wrote:

> Sorry its TreeDigest that will be missing not  OnlineSummarizer which is
> in org.apache.mahout.math.stats and imports:
>
> com.tdunning.math.stats.TDigest
> com.tdunning.math.stats.TreeDigest
>
>
>
>
> On 04/03/2015 08:17 PM, Andrew Palumbo wrote:
>
>> The problem is (if I understand correctly) that Srtream-lib has
>> TDijgest.java but not the rest of the classes in the t-digest artifact eg:
>> OnlineSummarizer which is used by the ResultAnalyzer class that I ported
>> over from MrLegacy. to Math-Scala in the confusion matrix (also ported to
>> math-scala).
>>
>> https://github.com/addthis/stream-lib/tree/master/src/
>> main/java/com/clearspring/analytics/stream/quantile
>>
>> I've added:
>>
>>      <include>com.tdunning:t-digest</include>
>> <include>org.apache.commons:commons-math3</include>
>>
>> to the spark/src/main/assembly/dependency-reduced.xml
>>
>> to include these jars in the spark-naive bayes CLI launcher.
>>
>> i believe that  the dependency-reduced jar was slimmed down from the
>> entire MrLegacy module, which included the t-digest artifact. to the few
>> dependencies that we have in it now.
>>
>> Not including these in the dependency-reduced jar leads to this exception:
>>
>>  Exception in thread"main"  java.lang.NoClassDefFoundError:
>> com/tdunning/math/stats/TDigest
>>         at org.apache.mahout.classifier.stats.ResultAnalyzer.<init>(
>> ClassifierStats.scala:64)
>>         at org.apache.mahout.classifier.naivebayes.NaiveBayes$class.
>> test(NaiveBayes.scala:303)
>>         at org.apache.mahout.classifier.naivebayes.NaiveBayes$.test(
>> NaiveBayes.scala:336)
>>         at org.apache.mahout.drivers.TestNBDriver$.process(
>> TestNBDriver.scala:105)
>>         at org.apache.mahout.drivers.TestNBDriver$$anonfun$main$1.
>> apply(TestNBDriver.scala:77)
>>         at org.apache.mahout.drivers.TestNBDriver$$anonfun$main$1.
>> apply(TestNBDriver.scala:75)
>>         at scala.Option.map(Option.scala:145)
>>
>>
>> I'm not sure that this is causing the exception below, but it does seem
>> possible.
>>
>> On 04/03/2015 07:26 PM, Suneel Marthi wrote:
>>
>>> You shouldn't be adding T-Digest again to Spark modules (since Stream-lib
>>> in Spark already has one).
>>>
>>> T-Digest is needed for MrLegacy and should be added as a dependency.
>>>
>>> On Fri, Apr 3, 2015 at 7:00 PM, Andrew Palumbo <[email protected]>
>>> wrote:
>>>
>>>  I'm wondering if it could be caused by the  TDigest class from the
>>>> artifact that I added to the dependency-reduced jar conflicting with the
>>>> spark TDigest class which, as you pointed out the other day, is on the
>>>> spark classpath.  The exception is coming right when the summarizer is
>>>> being used by the confusion matrix.
>>>>
>>>>
>>>> On 04/03/2015 06:22 PM, Dmitriy Lyubimov wrote:
>>>>
>>>>  saw a lot of these, some still bewildering, but they all related to
>>>>> non-local mode (different classpaths on backed and front end).
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Apr 3, 2015 at 1:39 PM, Andrew Palumbo <[email protected]>
>>>>> wrote:
>>>>>
>>>>>   Has anybody seen an exception like this when running a spark job?
>>>>>
>>>>>> the job completes but this exception is reported in the middle.
>>>>>>
>>>>>> 15/04/02 12:43:54 ERROR Remoting: org.apache.spark.storage.
>>>>>> BlockManagerId;
>>>>>> local class incompatible: stream classdesc serialVersionUID =
>>>>>> 2439208141545036836, local class serialVersionUID =
>>>>>> -7366074099953117729
>>>>>> java.io.InvalidClassException: org.apache.spark.storage.
>>>>>> BlockManagerId;
>>>>>> local class incompatible: stream classdesc serialVersionUID =
>>>>>> 2439208141545036836, local class serialVersionUID =
>>>>>> -7366074099953117729
>>>>>>       at java.io.ObjectStreamClass.initNonProxy(
>>>>>> ObjectStreamClass.java:617)
>>>>>>       at java.io.ObjectInputStream.readNonProxyDesc(
>>>>>> ObjectInputStream.java:1622)
>>>>>>       at java.io.ObjectInputStream.readClassDesc(
>>>>>> ObjectInputStream.java:1517)
>>>>>>       at java.io.ObjectInputStream.readOrdinaryObject(
>>>>>> ObjectInputStream.java:1771)
>>>>>>       at java.io.ObjectInputStream.readObject0(ObjectInputStream.
>>>>>> java:1350)
>>>>>>       at java.io.ObjectInputStream.defaultReadFields(
>>>>>> ObjectInputStream.java:1990)
>>>>>>       at java.io.ObjectInputStream.readSerialData(
>>>>>> ObjectInputStream.java:1915)
>>>>>>       at java.io.ObjectInputStream.readOrdinaryObject(
>>>>>> ObjectInputStream.java:1798)
>>>>>>       at java.io.ObjectInputStream.readObject0(ObjectInputStream.
>>>>>> java:1350)
>>>>>>       at java.io.ObjectInputStream.readObject(ObjectInputStream.
>>>>>> java:370)
>>>>>>       at akka.serialization.JavaSerializer$$anonfun$1.
>>>>>> apply(Serializer.scala:136)
>>>>>>       at scala.util.DynamicVariable.withValue(DynamicVariable.
>>>>>> scala:57)
>>>>>>       at akka.serialization.JavaSerializer.fromBinary(
>>>>>> Serializer.scala:136)
>>>>>>       at akka.serialization.Serialization$$anonfun$
>>>>>> deserialize$1.apply(
>>>>>> Serialization.scala:104)
>>>>>>       at scala.util.Try$.apply(Try.scala:161)
>>>>>>       at akka.serialization.Serialization.deserialize(
>>>>>> Serialization.scala:98)
>>>>>>       at akka.remote.MessageSerializer$.deserialize(
>>>>>> MessageSerializer.scala:23)
>>>>>>       at akka.remote.DefaultMessageDispatcher.
>>>>>> payload$lzycompute$1(Endpoint.
>>>>>> scala:55)
>>>>>>       at akka.remote.DefaultMessageDispatcher.
>>>>>> payload$1(Endpoint.scala:55)
>>>>>>       at akka.remote.DefaultMessageDispatcher.
>>>>>> dispatch(Endpoint.scala:73)
>>>>>>       at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(
>>>>>> Endpoint.scala:764)
>>>>>>       at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>>>>>       at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>>>>>       at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>>>>>       at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>>>>>       at akka.dispatch.ForkJoinExecutorConfigurator$
>>>>>> AkkaForkJoinTask.exec(
>>>>>> AbstractDispatcher.scala:386)
>>>>>>       at scala.concurrent.forkjoin.ForkJoinTask.doExec(
>>>>>> ForkJoinTask.java:260)
>>>>>>       at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
>>>>>> runTask(ForkJoinPool.java:1339)
>>>>>>       at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
>>>>>> ForkJoinPool.java:1979)
>>>>>>       at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
>>>>>> ForkJoinWorkerThread.java:107)
>>>>>>
>>>>>>
>>>>>>
>>
>>
>

Reply via email to