subject:"AW\: \[Spark R\]\: dapply only works for very small datasets"

AW: [Spark R]: dapply only works for very small datasets

2017-11-29 Thread Kunft, Andreas

November 28, 2017 3:11 AM Subject: AW: [Spark R]: dapply only works for very small datasets To: Felix Cheung <felixcheun...@hotmail.com>, <user@spark.apache.org> Thanks for the fast reply. I tried it locally, with 1 - 8 slots on a 8 core machine w/ 25GB memory as well as

AW: [Spark R]: dapply only works for very small datasets

2017-11-28 Thread Kunft, Andreas

Thanks for the fast reply. I tried it locally, with 1 - 8 slots on a 8 core machine w/ 25GB memory as well as on 4 nodes with the same specifications. When I shrink the data to around 100MB, it runs in about 1 hour for 1 core and about 6 min with 8 cores. I'm aware that the serDe takes