The problem is (if I understand correctly) that Srtream-lib has TDijgest.java but not the rest of the classes in the t-digest artifact eg: OnlineSummarizer which is used by the ResultAnalyzer class that I ported over from MrLegacy. to Math-Scala in the confusion matrix (also ported to math-scala).

https://github.com/addthis/stream-lib/tree/master/src/main/java/com/clearspring/analytics/stream/quantile

I've added:

     <include>com.tdunning:t-digest</include>
     <include>org.apache.commons:commons-math3</include>

to the spark/src/main/assembly/dependency-reduced.xml

to include these jars in the spark-naive bayes CLI launcher.

i believe that the dependency-reduced jar was slimmed down from the entire MrLegacy module, which included the t-digest artifact. to the few dependencies that we have in it now.

Not including these in the dependency-reduced jar leads to this exception:

 Exception in thread"main"  java.lang.NoClassDefFoundError: 
com/tdunning/math/stats/TDigest
        at 
org.apache.mahout.classifier.stats.ResultAnalyzer.<init>(ClassifierStats.scala:64)
        at 
org.apache.mahout.classifier.naivebayes.NaiveBayes$class.test(NaiveBayes.scala:303)
        at 
org.apache.mahout.classifier.naivebayes.NaiveBayes$.test(NaiveBayes.scala:336)
        at 
org.apache.mahout.drivers.TestNBDriver$.process(TestNBDriver.scala:105)
        at 
org.apache.mahout.drivers.TestNBDriver$$anonfun$main$1.apply(TestNBDriver.scala:77)
        at 
org.apache.mahout.drivers.TestNBDriver$$anonfun$main$1.apply(TestNBDriver.scala:75)
        at scala.Option.map(Option.scala:145)


I'm not sure that this is causing the exception below, but it does seem possible.

On 04/03/2015 07:26 PM, Suneel Marthi wrote:
You shouldn't be adding T-Digest again to Spark modules (since Stream-lib
in Spark already has one).

T-Digest is needed for MrLegacy and should be added as a dependency.

On Fri, Apr 3, 2015 at 7:00 PM, Andrew Palumbo <[email protected]> wrote:

I'm wondering if it could be caused by the  TDigest class from the
artifact that I added to the dependency-reduced jar conflicting with the
spark TDigest class which, as you pointed out the other day, is on the
spark classpath.  The exception is coming right when the summarizer is
being used by the confusion matrix.


On 04/03/2015 06:22 PM, Dmitriy Lyubimov wrote:

saw a lot of these, some still bewildering, but they all related to
non-local mode (different classpaths on backed and front end).



On Fri, Apr 3, 2015 at 1:39 PM, Andrew Palumbo <[email protected]>
wrote:

  Has anybody seen an exception like this when running a spark job?
the job completes but this exception is reported in the middle.

15/04/02 12:43:54 ERROR Remoting: org.apache.spark.storage.
BlockManagerId;
local class incompatible: stream classdesc serialVersionUID =
2439208141545036836, local class serialVersionUID = -7366074099953117729
java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId;
local class incompatible: stream classdesc serialVersionUID =
2439208141545036836, local class serialVersionUID = -7366074099953117729
      at java.io.ObjectStreamClass.initNonProxy(
ObjectStreamClass.java:617)
      at java.io.ObjectInputStream.readNonProxyDesc(
ObjectInputStream.java:1622)
      at java.io.ObjectInputStream.readClassDesc(
ObjectInputStream.java:1517)
      at java.io.ObjectInputStream.readOrdinaryObject(
ObjectInputStream.java:1771)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.
java:1350)
      at java.io.ObjectInputStream.defaultReadFields(
ObjectInputStream.java:1990)
      at java.io.ObjectInputStream.readSerialData(
ObjectInputStream.java:1915)
      at java.io.ObjectInputStream.readOrdinaryObject(
ObjectInputStream.java:1798)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.
java:1350)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
      at akka.serialization.JavaSerializer$$anonfun$1.
apply(Serializer.scala:136)
      at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
      at akka.serialization.JavaSerializer.fromBinary(
Serializer.scala:136)
      at akka.serialization.Serialization$$anonfun$deserialize$1.apply(
Serialization.scala:104)
      at scala.util.Try$.apply(Try.scala:161)
      at akka.serialization.Serialization.deserialize(
Serialization.scala:98)
      at akka.remote.MessageSerializer$.deserialize(
MessageSerializer.scala:23)
      at akka.remote.DefaultMessageDispatcher.
payload$lzycompute$1(Endpoint.
scala:55)
      at akka.remote.DefaultMessageDispatcher.
payload$1(Endpoint.scala:55)
      at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:73)
      at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(
Endpoint.scala:764)
      at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
      at akka.actor.ActorCell.invoke(ActorCell.scala:456)
      at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
      at akka.dispatch.Mailbox.run(Mailbox.scala:219)
      at akka.dispatch.ForkJoinExecutorConfigurator$
AkkaForkJoinTask.exec(
AbstractDispatcher.scala:386)
      at scala.concurrent.forkjoin.ForkJoinTask.doExec(
ForkJoinTask.java:260)
      at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
runTask(ForkJoinPool.java:1339)
      at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
ForkJoinPool.java:1979)
      at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
ForkJoinWorkerThread.java:107)



Reply via email to