Daniel Li created SPARK-20486:
---------------------------------
Summary: Encapsulate ALS in-block and out-block data structures
and methods into a separate class
Key: SPARK-20486
URL: https://issues.apache.org/jira/browse/SPARK-20486
Project: Spark
Issue Type: Improvement
Components: ML, MLlib
Affects Versions: 2.1.0
Reporter: Daniel Li
Priority: Trivial
The in-block and out-block data structures in the ALS code is currently
calculated within the {{ALS.train}} method itself. I propose to move this
code, along with its helper functions, into a separate class to encapsulate the
creation of the blocks. This has the added benefit of allowing us to include a
comprehensive Scaladoc to this new class to explain in detail how this core
part of the algorithm works.
Proposal:
{code}
private[recommendation] final case class RatingBlocks[ID](
userIn: RDD[(Int, InBlock[ID])],
userOut: RDD[(Int, OutBlock)],
itemIn: RDD[(Int, InBlock[ID])],
itemOut: RDD[(Int, OutBlock)]
)
private[recommendation] object RatingBlocks {
def create[ID: ClassTag: Ordering](
ratings: RDD[Rating[ID]],
numUserBlocks: Int,
numItemBlocks: Int,
storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK):
RatingBlocks[ID] = {
// In-block and out-block code currently in `ALS.train` goes here
}
private[this] def partitionRatings[ID: ClassTag](...) = { ... }
private[this] def makeBlocks[ID: ClassTag](...) = { ... }
}
{code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]