November 28, 2017 3:11 AM
Subject: AW: [Spark R]: dapply only works for very small datasets
To: Felix Cheung <felixcheun...@hotmail.com>, <user@spark.apache.org>
Thanks for the fast reply.
I tried it locally, with 1 - 8 slots on a 8 core machine w/ 25GB memory as well
as
Thanks for the fast reply.
I tried it locally, with 1 - 8 slots on a 8 core machine w/ 25GB memory as well
as on 4 nodes with the same specifications.
When I shrink the data to around 100MB,
it runs in about 1 hour for 1 core and about 6 min with 8 cores.
I'm aware that the serDe takes