[
https://issues.apache.org/jira/browse/MAHOUT-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Palumbo updated MAHOUT-2099:
-----------------------------------
Description:
I have a Spark Cluster already setup, and this is the environment not in my
direct control, but they do allow FAT JARs to be installed with the
dependencies. I tried to package my Spark Application with some mahout code for
SimilarityAnalysis, added Mahout library in POM file, and they are successfully
packaged.
The problem however is that I am getting this error while using existing Spark
Context to build Distributed Spark Context for
Mahout
[EDIT]AP:
{code:xml}
pom.xml
{...}
dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-math</artifactId>
<version>0.13.0</version>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-math-scala_2.10</artifactId>
<version>0.13.0</version>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-spark_2.10</artifactId>
<version>0.13.0</version>
</dependency>
<dependency>
<groupId>com.esotericsoftware</groupId>
<artifactId>kryo</artifactId>
<version>5.0.0-RC5</version>
</dependency>
{code}
Code:
{code}
implicit val sc: SparkContext = sparkSession.sparkContext
implicit val msc: SparkDistributedContext = sc2sdc(sc)
Error:
ERROR TaskSetManager: Task 7.0 in stage 10.0 (TID 58) had a not serializable
result: org.apache.mahout.math.DenseVector
And if I try to build the context using mahoutSparkContext() then its giving me
the error that MAHOUT_HOME not found.
Code:
implicit val msc = mahoutSparkContext(masterUrl = "local", appName =
"CooccurrenceDriver")
Error:
MAHOUT_HOME is required to spawn mahout-based spark jobs
{code}
My question is that how do I proceed in this situation? should I have to ask
the administrators of the Spark environment to install Mahout library, or is
there anyway I can proceed packaging my application as fat JAR.
was:
I have a Spark Cluster already setup, and this is the environment not in my
direct control, but they do allow FAT JARs to be installed with the
dependencies. I tried to package my Spark Application with some mahout code for
SimilarityAnalysis, added Mahout library in POM file, and they are successfully
packaged.
The problem however is that I am getting this error while using existing Spark
Context to build Distributed Spark Context for
Mahout
[EDIT]AP:
{
{code:java}
pm.xml:{code}
{code:java}
dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-math</artifactId>
<version>0.13.0</version>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-math-scala_2.10</artifactId>
<version>0.13.0</version>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-spark_2.10</artifactId>
<version>0.13.0</version>
</dependency>
<dependency>
<groupId>com.esotericsoftware</groupId>
<artifactId>kryo</artifactId>
<version>5.0.0-RC5</version>
</dependency> {code}
}
Code:
implicit val sc: SparkContext = sparkSession.sparkContext
implicit val msc: SparkDistributedContext = sc2sdc(sc)
Error:
ERROR TaskSetManager: Task 7.0 in stage 10.0 (TID 58) had a not serializable
result: org.apache.mahout.math.DenseVector
And if I try to build the context using mahoutSparkContext() then its giving me
the error that MAHOUT_HOME not found.
Code:
implicit val msc = mahoutSparkContext(masterUrl = "local", appName =
"CooccurrenceDriver")
Error:
MAHOUT_HOME is required to spawn mahout-based spark jobs
My question is that how do I proceed in this situation? should I have to ask
the administrators of the Spark environment to install Mahout library, or is
there anyway I can proceed packaging my application as fat JAR.
> Using Mahout as a Library in Spark Cluster
> ------------------------------------------
>
> Key: MAHOUT-2099
> URL: https://issues.apache.org/jira/browse/MAHOUT-2099
> Project: Mahout
> Issue Type: Question
> Components: cooccurrence, Math
> Environment: Spark version 2.3.0.2.6.5.10-2
>
> [EDIT] AP
> Reporter: Tariq Jawed
> Priority: Major
>
> I have a Spark Cluster already setup, and this is the environment not in my
> direct control, but they do allow FAT JARs to be installed with the
> dependencies. I tried to package my Spark Application with some mahout code
> for SimilarityAnalysis, added Mahout library in POM file, and they are
> successfully packaged.
> The problem however is that I am getting this error while using existing
> Spark Context to build Distributed Spark Context for
> Mahout
> [EDIT]AP:
> {code:xml}
> pom.xml
> {...}
> dependency>
> <groupId>org.apache.mahout</groupId>
> <artifactId>mahout-math</artifactId>
> <version>0.13.0</version>
> </dependency>
> <dependency>
> <groupId>org.apache.mahout</groupId>
> <artifactId>mahout-math-scala_2.10</artifactId>
> <version>0.13.0</version>
> </dependency>
> <dependency>
> <groupId>org.apache.mahout</groupId>
> <artifactId>mahout-spark_2.10</artifactId>
> <version>0.13.0</version>
> </dependency>
> <dependency>
> <groupId>com.esotericsoftware</groupId>
> <artifactId>kryo</artifactId>
> <version>5.0.0-RC5</version>
> </dependency>
> {code}
>
> Code:
> {code}
> implicit val sc: SparkContext = sparkSession.sparkContext
> implicit val msc: SparkDistributedContext = sc2sdc(sc)
> Error:
> ERROR TaskSetManager: Task 7.0 in stage 10.0 (TID 58) had a not serializable
> result: org.apache.mahout.math.DenseVector
>
> And if I try to build the context using mahoutSparkContext() then its giving
> me the error that MAHOUT_HOME not found.
> Code:
> implicit val msc = mahoutSparkContext(masterUrl = "local", appName =
> "CooccurrenceDriver")
> Error:
> MAHOUT_HOME is required to spawn mahout-based spark jobs
> {code}
> My question is that how do I proceed in this situation? should I have to ask
> the administrators of the Spark environment to install Mahout library, or is
> there anyway I can proceed packaging my application as fat JAR.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)