Re: Cluster sizing for recommendations

2015-07-28 Thread Xiangrui Meng
Hi Danny, You might need to reduce the number of partitions (or set userBlocks and productBlocks directly in ALS). Using a large number of partitions increases shuffle size and memory requirement. If you have 16 x 16 = 256 cores. I would recommend 64 or 128 instead of 2048.

Cluster sizing for recommendations

2015-07-06 Thread Danny Yates
Hi, I'm having trouble building a recommender and would appreciate a few pointers. I have 350,000,000 events which are stored in roughly 500,000 S3 files and are formatted as semi-structured JSON. These events are not all relevant to making recommendations. My code is (roughly): case class