The problem is (if I understand correctly) that Srtream-lib has
TDijgest.java but not the rest of the classes in the t-digest artifact
eg: OnlineSummarizer which is used by the ResultAnalyzer class that I
ported over from MrLegacy. to Math-Scala in the confusion matrix (also
ported to math-scala).
https://github.com/addthis/stream-lib/tree/master/src/main/java/com/clearspring/analytics/stream/quantile
I've added:
<include>com.tdunning:t-digest</include>
<include>org.apache.commons:commons-math3</include>
to the spark/src/main/assembly/dependency-reduced.xml
to include these jars in the spark-naive bayes CLI launcher.
i believe that the dependency-reduced jar was slimmed down from the
entire MrLegacy module, which included the t-digest artifact. to the few
dependencies that we have in it now.
Not including these in the dependency-reduced jar leads to this exception:
Exception in thread"main" java.lang.NoClassDefFoundError:
com/tdunning/math/stats/TDigest
at
org.apache.mahout.classifier.stats.ResultAnalyzer.<init>(ClassifierStats.scala:64)
at
org.apache.mahout.classifier.naivebayes.NaiveBayes$class.test(NaiveBayes.scala:303)
at
org.apache.mahout.classifier.naivebayes.NaiveBayes$.test(NaiveBayes.scala:336)
at
org.apache.mahout.drivers.TestNBDriver$.process(TestNBDriver.scala:105)
at
org.apache.mahout.drivers.TestNBDriver$$anonfun$main$1.apply(TestNBDriver.scala:77)
at
org.apache.mahout.drivers.TestNBDriver$$anonfun$main$1.apply(TestNBDriver.scala:75)
at scala.Option.map(Option.scala:145)
I'm not sure that this is causing the exception below, but it does seem
possible.
On 04/03/2015 07:26 PM, Suneel Marthi wrote:
You shouldn't be adding T-Digest again to Spark modules (since Stream-lib
in Spark already has one).
T-Digest is needed for MrLegacy and should be added as a dependency.
On Fri, Apr 3, 2015 at 7:00 PM, Andrew Palumbo <[email protected]> wrote:
I'm wondering if it could be caused by the TDigest class from the
artifact that I added to the dependency-reduced jar conflicting with the
spark TDigest class which, as you pointed out the other day, is on the
spark classpath. The exception is coming right when the summarizer is
being used by the confusion matrix.
On 04/03/2015 06:22 PM, Dmitriy Lyubimov wrote:
saw a lot of these, some still bewildering, but they all related to
non-local mode (different classpaths on backed and front end).
On Fri, Apr 3, 2015 at 1:39 PM, Andrew Palumbo <[email protected]>
wrote:
Has anybody seen an exception like this when running a spark job?
the job completes but this exception is reported in the middle.
15/04/02 12:43:54 ERROR Remoting: org.apache.spark.storage.
BlockManagerId;
local class incompatible: stream classdesc serialVersionUID =
2439208141545036836, local class serialVersionUID = -7366074099953117729
java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId;
local class incompatible: stream classdesc serialVersionUID =
2439208141545036836, local class serialVersionUID = -7366074099953117729
at java.io.ObjectStreamClass.initNonProxy(
ObjectStreamClass.java:617)
at java.io.ObjectInputStream.readNonProxyDesc(
ObjectInputStream.java:1622)
at java.io.ObjectInputStream.readClassDesc(
ObjectInputStream.java:1517)
at java.io.ObjectInputStream.readOrdinaryObject(
ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.
java:1350)
at java.io.ObjectInputStream.defaultReadFields(
ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(
ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(
ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.
java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at akka.serialization.JavaSerializer$$anonfun$1.
apply(Serializer.scala:136)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at akka.serialization.JavaSerializer.fromBinary(
Serializer.scala:136)
at akka.serialization.Serialization$$anonfun$deserialize$1.apply(
Serialization.scala:104)
at scala.util.Try$.apply(Try.scala:161)
at akka.serialization.Serialization.deserialize(
Serialization.scala:98)
at akka.remote.MessageSerializer$.deserialize(
MessageSerializer.scala:23)
at akka.remote.DefaultMessageDispatcher.
payload$lzycompute$1(Endpoint.
scala:55)
at akka.remote.DefaultMessageDispatcher.
payload$1(Endpoint.scala:55)
at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:73)
at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(
Endpoint.scala:764)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$
AkkaForkJoinTask.exec(
AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(
ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
ForkJoinWorkerThread.java:107)