[ https://issues.apache.org/jira/browse/SYSTEMML-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
LI Guobao updated SYSTEMML-2418: -------------------------------- Summary: Spark data partitioner (was: Distributing data to workers) > Spark data partitioner > ---------------------- > > Key: SYSTEMML-2418 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2418 > Project: SystemML > Issue Type: Sub-task > Reporter: LI Guobao > Assignee: LI Guobao > Priority: Major > > In the context of ps, the training data will be partitioned according to the > different schemes. This conversion is executed in driver node and the > partitioned data should be distributed to workers via broadcast. Due to the > 2G limitation of spark broadcast, we could leverage the > _PartitionedBroadcast_ class to do this conversion. Afterwards, the > partitioned broadcast object can be passed to workers for launching its job. -- This message was sent by Atlassian JIRA (v7.6.3#76005)