[ 
https://issues.apache.org/jira/browse/MAHOUT-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068297#comment-17068297
 ] 

Tariq Jawed commented on MAHOUT-2099:
-------------------------------------

[~Andrew_Palumbo] I have installed mahout on the docker container and set the 
MAHOUT_HOME as well; but I noticed that my Scala version is 2.11 and Spark 2.3, 
will it work with these POM dependencies 
{code:java}
<dependency>
 <groupId>org.apache.mahout</groupId>
 <artifactId>mahout-math</artifactId>
 <version>0.13.0</version>
</dependency>
<dependency>
 <groupId>org.apache.mahout</groupId>
 <artifactId>mahout-math-scala_2.10</artifactId>
 <version>0.13.0</version>
 <!--<scope>system</scope>
 
<systemPath>${basedir}/data/mahout-libs/mahout-math-scala_2.11-0.13.0.jar</systemPath>-->
</dependency>
<dependency>
 <groupId>org.apache.mahout</groupId>
 <artifactId>mahout-spark_2.10</artifactId>
 <version>0.13.0</version>
 <!--<scope>system</scope>
 
<systemPath>${basedir}/data/mahout-libs/mahout-spark_2.11-0.13.0.jar</systemPath>-->
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.mahout/mahout-hdfs -->
<dependency>
 <groupId>org.apache.mahout</groupId>
 <artifactId>mahout-hdfs</artifactId>
 <version>0.13.0</version>
</dependency>


Error:

20/03/27 05:54:25 ERROR executor.Executor: Exception in task 0.0 in stage 12.0 
(TID 11)
java.io.NotSerializableException: org.apache.mahout.math.DenseVector
Serialization stack:
        - object not serializable (class: org.apache.mahout.math.DenseVector, 
value: 
{0:1.0,1:1.0,2:1.0,3:1.0,4:1.0,5:4.0,6:1.0,7:1.0,8:1.0,9:2.0,10:1.0,11:1.0,12:1.0,13:1.0,14:1.0,15:1.0,16:1.0,17:1.0,18:1.0,19:1.0,20:1.0,21:1.0,22:7.0,23:1.0,24:1.0,25:1.0,26:1.0,27:1.0,28:1.0,29:1.0,30:1.0,31:1.0,32:1.0,33:1.0,34:1.0,35:1.0,36:3.0,37:1.0,38:1.0,39:1.0,40:1.0})
        - field (class: scala.Some, name: x, type: class java.lang.Object)
        - object (class scala.Some, 
Some({0:1.0,1:1.0,2:1.0,3:1.0,4:1.0,5:4.0,6:1.0,7:1.0,8:1.0,9:2.0,10:1.0,11:1.0,12:1.0,13:1.0,14:1.0,15:1.0,16:1.0,17:1.0,18:1.0,19:1.0,20:1.0,21:1.0,22:7.0,23:1.0,24:1.0,25:1.0,26:1.0,27:1.0,28:1.0,29:1.0,30:1.0,31:1.0,32:1.0,33:1.0,34:1.0,35:1.0,36:3.0,37:1.0,38:1.0,39:1.0,40:1.0})){code}

> Using Mahout as a Library in Spark Cluster
> ------------------------------------------
>
>                 Key: MAHOUT-2099
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-2099
>             Project: Mahout
>          Issue Type: Question
>          Components: cooccurrence, Math
>         Environment: Spark version 2.3.0.2.6.5.10-2
>  
> [EDIT] AP
>            Reporter: Tariq Jawed
>            Priority: Major
>
> I have a Spark Cluster already setup, and this is the environment not in my 
> direct control, but they do allow FAT JARs to be installed with the 
> dependencies. I tried to package my Spark Application with some mahout code 
> for SimilarityAnalysis, added Mahout library in POM file, and they are 
> successfully packaged.
> The problem however is that I am getting this error while using existing 
> Spark Context to build Distributed Spark Context for
> Mahout
> [EDIT]AP:
> {code:xml}
> pom.xml
> {...}
> dependency>
>  <groupId>org.apache.mahout</groupId>
>  <artifactId>mahout-math</artifactId>
>  <version>0.13.0</version>
>  </dependency>
>  <dependency>
>  <groupId>org.apache.mahout</groupId>
>  <artifactId>mahout-math-scala_2.10</artifactId>
>  <version>0.13.0</version>
>  </dependency>
>  <dependency>
>  <groupId>org.apache.mahout</groupId>
>  <artifactId>mahout-spark_2.10</artifactId>
>  <version>0.13.0</version>
>  </dependency>
>  <dependency>
>  <groupId>com.esotericsoftware</groupId>
>  <artifactId>kryo</artifactId>
>  <version>5.0.0-RC5</version>
>  </dependency>
>  {code}
>  
> Code:
> {code}
> implicit val sc: SparkContext = sparkSession.sparkContext
> implicit val msc: SparkDistributedContext = sc2sdc(sc)
> Error:
> ERROR TaskSetManager: Task 7.0 in stage 10.0 (TID 58) had a not serializable 
> result: org.apache.mahout.math.DenseVector
>  
> And if I try to build the context using mahoutSparkContext() then its giving 
> me the error that MAHOUT_HOME not found. 
> Code:
> implicit val msc = mahoutSparkContext(masterUrl = "local", appName = 
> "CooccurrenceDriver")
> Error:
> MAHOUT_HOME is required to spawn mahout-based spark jobs
>  {code}
> My question is that how do I proceed in this situation? should I have to ask 
> the administrators of the Spark environment to install Mahout library, or is 
> there anyway I can proceed packaging my application as fat JAR. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to