[
https://issues.apache.org/jira/browse/MAHOUT-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036477#comment-14036477
]
Hudson commented on MAHOUT-1573:
--------------------------------
SUCCESS: Integrated in Mahout-Quality #2665 (See
[https://builds.apache.org/job/Mahout-Quality/2665/])
MAHOUT-1573: More explicit parallelism adjustments in math-scala DRM apis;
elements of automatic parallelism management (dlyubimov: rev
3dd18344a47fb86b5127bcf3e051a2eb4e7ca996)
* spark/src/main/scala/org/apache/mahout/sparkbindings/SparkEngine.scala
*
spark/src/test/scala/org/apache/mahout/sparkbindings/test/MahoutLocalContext.scala
* spark/src/main/scala/org/apache/mahout/sparkbindings/drm/DrmRddInput.scala
* spark/src/main/scala/org/apache/mahout/sparkbindings/blas/Par.scala
* CHANGELOG
*
math-scala/src/main/scala/org/apache/mahout/math/drm/logical/AbstractBinaryOp.scala
* math-scala/src/main/scala/org/apache/mahout/math/drm/DrmLikeOps.scala
* math-scala/src/main/scala/org/apache/mahout/math/drm/DistributedEngine.scala
* spark/src/test/scala/org/apache/mahout/sparkbindings/drm/DrmLikeOpsSuite.scala
* spark/pom.xml
* math-scala/src/main/scala/org/apache/mahout/math/drm/logical/OpPar.scala
> More explicit parallelism adjustments in math-scala DRM apis; elements of
> automatic parallelism management
> ----------------------------------------------------------------------------------------------------------
>
> Key: MAHOUT-1573
> URL: https://issues.apache.org/jira/browse/MAHOUT-1573
> Project: Mahout
> Issue Type: Task
> Affects Versions: 0.9
> Reporter: Dmitriy Lyubimov
> Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> (1) add minSplit parameter pass-thru to drmFromHDFS to be able to explicitly
> increase parallelism.
> (2) add parrallelism readjustment parameter to a checkpoint() call. This
> implies shuffle-less coalesce() translation to the data set before it is
> requested to be cached (if specified).
> Going forward, we probably should try and figure how we can automate it, at
> least a little bit. For example, the simplest automatic adjustment might
> include re-adjust parallelims on load to simply fit cluster size (95% or 180%
> of cluster size, for example), with some rule-of-thumb safeguards here, e.g.
> we cannot exceed a factor of say 8 (or whatever we configure) in splitting
> each original hdfs split. We should be able to get a reasonable parallelism
> performance out of the box on simple heuristics like that.
--
This message was sent by Atlassian JIRA
(v6.2#6252)