Hi Danny,
You might need to reduce the number of partitions (or set userBlocks
and productBlocks directly in ALS). Using a large number of partitions
increases shuffle size and memory requirement. If you have 16 x 16 =
256 cores. I would recommend 64 or 128 instead of 2048.
Hi,
I'm having trouble building a recommender and would appreciate a few
pointers.
I have 350,000,000 events which are stored in roughly 500,000 S3 files and
are formatted as semi-structured JSON. These events are not all relevant to
making recommendations.
My code is (roughly):
case class