[
https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968587#comment-13968587
]
Dmitriy Lyubimov commented on MAHOUT-1464:
------------------------------------------
Running using Spark Client (inside the cluster) is a new thing in 0.9. Assuming
it is stable, it is not supported at this point and going this way will have
multiple hurdles.
for one, mahout spark context requires MAHOUT_HOME to set all mahout binaries
properly. The assumption is one needs Mahout's binaries only on driver's side,
but if driver runs inside remote cluster, this will fail. So our batches should
really be started in one of the ways i described in earlier email.
Second, i don't think driver can load classes reliably because it includes
Mahout dependencies such as mahout-math. That's another reason why using Client
seems problematic to me -- it assumes one has his _entire_ application within
that jar. So not true.
That said, your attempt doesn't exhibit any direct ClassNotFounds and looks
more like akka communication issues i.e. spark setup issues. One thing about
Spark is that requires direct port connectivity not only between cluster nodes
but also back to client. In particular it means your client must not firewall
incoming calls and must not be behind NAT. (even port forwarding doesn't really
solve networking issues here). So my first bet would be on akka connectivity
issues between cluster and back to client.
> Cooccurrence Analysis on Spark
> ------------------------------
>
> Key: MAHOUT-1464
> URL: https://issues.apache.org/jira/browse/MAHOUT-1464
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Environment: hadoop, spark
> Reporter: Pat Ferrel
> Assignee: Sebastian Schelter
> Fix For: 1.0
>
> Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch,
> MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh
>
>
> Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that
> runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM
> can be used as input.
> Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has
> several applications including cross-action recommendations.
--
This message was sent by Atlassian JIRA
(v6.2#6252)