[ 
https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048499#comment-17048499
 ] 

Andrew Palumbo edited comment on MAHOUT-2093 at 3/1/20 8:59 AM:
----------------------------------------------------------------

This is an issue with the Scopt 3.3.0 CLI interface.  We've upgraded in the 
current master for v14.1 to Scopt v3.7.1, which has solved the problem.

The Mahout Spark Shell is actually handled differently in the call to 
`/bin/mahout`, and is a pass through to Spark's Scala shell [1], with the 
mahout jars added, so it does not the Scopt CLI drivers [1][2][3], which is why 
it works without issue in that release. 

0.14.1 is a huge refactor of the codebase, we're still working out some of the 
kinks in 14.1. 

I would suggest the last RC, but I believe there was a missing module, from the 
source distribution which was the reason we scrapped it.

CLI drivers should be working in the current {{github/master}}: 
[https://github.com/apache/mahout.git] which is currently (mostly) stable.

[1] [https://github.com/apache/mahout/blob/branch-0.14.0/bin/mahout#L299-L314]
 [2] 
[https://github.com/apache/mahout/blob/branch-0.14.0/community/spark-cli-drivers/src/main/scala/org/apache/mahout/drivers/ItemSimilarityDriver.scala#L44]
 [3] 
[https://github.com/apache/mahout/blob/branch-0.14.0/community/spark-cli-drivers/src/main/scala/org/apache/mahout/drivers/MahoutOptionParser.scala#L30]


was (Author: andrew_palumbo):
This is an issue with the Scopt 3.3.0 CLI interface.  We've upgraded in the 
current master for v14.1 to Scopt v3.7.1, which has solved the problem.

The Mahout Spark Shell is actually handled differently in the call to 
`/bin/mahout`, and is a pass through to Spark's Scala shell [1], with the 
mahout jars added, so it does not the Scopt CLI drivers [1][2][3], which is why 
it works without issue in that release.  

0.14.1 is a huge refactor of the codebase, we're still working out some of the 
kinks in 14.1.  

I would suggest the last RC, but I believe there was a missing module, from the 
source distribution which was the reason we scrapped it.  It should be working 
in `github/master`: [https://github.com/apache/mahout.git] which is currently 
(mostly) stable.

[1] [https://github.com/apache/mahout/blob/branch-0.14.0/bin/mahout#L299-L314]
[2] 
[https://github.com/apache/mahout/blob/branch-0.14.0/community/spark-cli-drivers/src/main/scala/org/apache/mahout/drivers/ItemSimilarityDriver.scala#L44]
[3] 
https://github.com/apache/mahout/blob/branch-0.14.0/community/spark-cli-drivers/src/main/scala/org/apache/mahout/drivers/MahoutOptionParser.scala#L30

> Mahout Source Broken
> --------------------
>
>                 Key: MAHOUT-2093
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-2093
>             Project: Mahout
>          Issue Type: Bug
>          Components: Algorithms, Collaborative Filtering, Documentation
>    Affects Versions: 0.14.0, 0.13.2
>            Reporter: Stefan Goldener
>            Priority: Blocker
>
> Seems like newer versions of Mahout do have problems with spark bindings e.g. 
> mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to 
> class not found exceptions. 
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.RowSimilarityDriver
> {code}
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.ItemSimilarityDriver
> {code}
> whereas *mahout spark-shell* works flawlessly.
> Here is a short Dockerfile to show the issue:
> {code:yaml}
> FROM openjdk:8-alpine
> ENV spark_uid=185
> ENV SCALA_MAJOR=2.11
> ENV SCALA_MAJOR_MINOR=2.11.12
> ENV HADOOP_MAJOR=2.7
> ENV SPARK_MAJOR_MINOR=2.4.5
> ENV MAHOUT_MAJOR_MINOR=0.14.0
> ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR}
> ENV MAHOUT_BASE=/opt/mahout
> ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION}
> ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR}
> ENV SPARK_BASE=/opt/spark
> ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION}
> ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
> ENV 
> SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz";
> ENV 
> MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip";
> ENV ZINC_PORT=3030
> ### build spark
> RUN set -ex && \
>     apk upgrade --no-cache && \
>     ln -s /lib /lib64 && \
>     apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 
> krb5-libs nss curl openssl git maven && \
>     pip install setuptools && \
>     mkdir -p ${MAHOUT_HOME} && \
>     mkdir -p ${SPARK_BASE} && \
>     curl  -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz  && \
>     tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \
>     rm ${SPARK_HOME}.tgz && \
>     export 
> PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin 
> && \
>     bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \
>     bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} 
> --pip --tgz -DzincPort=${ZINC_PORT} \
>             -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive 
> -Phive-thriftserver -Pscala-${SCALA_MAJOR}
>     
> ### build mahout
> RUN curl -LfsS $MAHOUT_SRC_URL -o ${MAHOUT_BASE}.zip  && \
>     unzip ${MAHOUT_BASE}.zip -d ${MAHOUT_BASE} && \ 
>     rm ${MAHOUT_BASE}.zip && \
>     cd ${MAHOUT_HOME} && \
>     mvn -Dspark.version=${SPARK_MAJOR_MINOR} 
> -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} 
> -DskipTests -Dmaven.javadoc.skip=true clean package 
> {code}
> docker build . -t mahout-test
>  docker run -it mahout-test /bin/bash



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to