[
https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17071385#comment-17071385
]
Andrew Palumbo commented on MAHOUT-2093:
----------------------------------------
thank you again [~renedlog] for reporting and for your patientsce.. I have an
answer prepared, but talked to [~pferrel] and want him to check this out.
Essentially, I believe (and tested to confirm) that the {{/ib}} directory is
not packing transitive (or even direct, I believe in some cases) dependencies,
we used to have a fat jar that always ended u[p on the class-path from
{{/bin/mahout}}. This is a "Fix the release release, which is why we are seeing
these types of issues.."
>From 0.13.0 to 14.1 there was a total refactor and reorg of the code-base, so
>we're still working out release kinks..
My previous previous answer .. waiting on word form [~pferrel] ; my fault for
not passing this on earlier. I'd asked him to take a look, and didn't get it
to him.
{quote}[~renedlog] yes I've identified the issue. Dependencies and transitive
dependencies were not being added to the {{$MAHOUT_HOME/lib}} . This is why you
were getting errors in several different areas. We've redone the {{pom.xml}}s a
full refactor and have not had a chance to catch these issues yet. (it passes
tests, locally and in {{TravisCI}} because when developing there is an
{{install}} goal in the build, so the local .m2 cache had the necessary
dependencies.
bq. So, in short, {{/bin/mahout}} is not picking up _any_ dependencies because
the poms still have issues, It should be a relatively simple fix.
bq. I've had some personal issues to deal with these last few weeks, and build
issues are assigned to me, currently.
bq. The reason that this is in an RC, is as mentioned, developers end up with
the dependencies in their/our {{~/.m2/cache}} so the tests pass with e.g {{mvn
clean package install}} , all dependencies are in my .m2. cache..
bq. As well, with the move away from MapReduce, mahout is intended more now, as
a library than a CLI, though we do still provide CLI launchers for CCO and some
other algorithms well as traits to simply implement them, we prioritize the
library functionality higher during smoke testing, etc.
{quote}
In previous releases, we had a fat jar on the class-path, as well as ./{{lib}}.
Overlooked the missing fat jar here.
We check on clusters when we have a more viable RC, usually.. As well 14.1 is a
release to provide convenience binaries (i.e. in 0.14.0 we shipped no binary
artifacts), I started releasing to Nexus pretty early in the cycle for that
reason.
You were right this is not the Scopt issue that I'd originally thought, that
issue had been a thorn in our side for a long time.
Thank you very much for catching this, and reporting it to us, as well as for
your patience. we'll get a fix out shortly to master, so that, at least,
> Mahout Source Broken
> --------------------
>
> Key: MAHOUT-2093
> URL: https://issues.apache.org/jira/browse/MAHOUT-2093
> Project: Mahout
> Issue Type: Bug
> Components: Algorithms, Collaborative Filtering, Documentation
> Affects Versions: 0.14.0, 0.13.2, 14.1
> Reporter: Stefan Goldener
> Priority: Blocker
> Fix For: 14.1
>
> Attachments: image-2020-03-12-07-10-34-731.png
>
>
> Seems like newer versions of Mahout do have problems with spark bindings e.g.
> mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to
> class not found exceptions.
> {code:java}
> Error: Could not find or load main class
> org.apache.mahout.drivers.RowSimilarityDriver
> {code}
> {code:java}
> Error: Could not find or load main class
> org.apache.mahout.drivers.ItemSimilarityDriver
> {code}
> whereas *mahout spark-shell* works flawlessly.
> Here is a short Dockerfile to show the issue:
> {code:yaml}
> FROM openjdk:8-alpine
> ENV spark_uid=185
> ENV SCALA_MAJOR=2.11
> ENV SCALA_MAJOR_MINOR=2.11.12
> ENV HADOOP_MAJOR=2.7
> ENV SPARK_MAJOR_MINOR=2.4.5
> ENV MAHOUT_MAJOR_MINOR=0.14.0
> ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR}
> ENV MAHOUT_BASE=/opt/mahout
> ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION}
> ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR}
> ENV SPARK_BASE=/opt/spark
> ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION}
> ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
> ENV
> SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz"
> ENV
> MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip"
> ENV ZINC_PORT=3030
> ### build spark
> RUN set -ex && \
> apk upgrade --no-cache && \
> ln -s /lib /lib64 && \
> apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5
> krb5-libs nss curl openssl git maven && \
> pip install setuptools && \
> mkdir -p ${MAHOUT_HOME} && \
> mkdir -p ${SPARK_BASE} && \
> curl -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz && \
> tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \
> rm ${SPARK_HOME}.tgz && \
> export
> PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin
> && \
> bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \
> bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION}
> --pip --tgz -DzincPort=${ZINC_PORT} \
> -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive
> -Phive-thriftserver -Pscala-${SCALA_MAJOR}
>
> ### build mahout
> RUN curl -LfsS $MAHOUT_SRC_URL -o ${MAHOUT_BASE}.zip && \
> unzip ${MAHOUT_BASE}.zip -d ${MAHOUT_BASE} && \
> rm ${MAHOUT_BASE}.zip && \
> cd ${MAHOUT_HOME} && \
> mvn -Dspark.version=${SPARK_MAJOR_MINOR}
> -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR}
> -DskipTests -Dmaven.javadoc.skip=true clean package
> {code}
> docker build . -t mahout-test
> docker run -it mahout-test /bin/bash
--
This message was sent by Atlassian Jira
(v8.3.4#803005)