[
https://issues.apache.org/jira/browse/SPARK-6177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yuhao yang updated SPARK-6177:
------------------------------
Description:
Add comment to introduce coalesce to LDA example to avoid the possible massive
partitions from sc.textFile.
sc.textFile will create RDD with one partition for each file, and the possible
massive partitions downgrades LDA performance.
was:sc.textFile will create RDD with one partition for each file, and the
possible massive partitions downgrades LDA performance.
> LDA should check partitions size of the input
> ---------------------------------------------
>
> Key: SPARK-6177
> URL: https://issues.apache.org/jira/browse/SPARK-6177
> Project: Spark
> Issue Type: Improvement
> Components: Examples, MLlib
> Affects Versions: 1.2.1
> Reporter: yuhao yang
> Priority: Minor
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Add comment to introduce coalesce to LDA example to avoid the possible
> massive partitions from sc.textFile.
> sc.textFile will create RDD with one partition for each file, and the
> possible massive partitions downgrades LDA performance.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]