In Spark 1.4 and 1.5, you can do something like this:
df.write.partitionBy("key").parquet("/datasink/output-parquets")
BTW, I'm curious about how did you do it without partitionBy using
saveAsHadoopFile?
Cheng
On 9/8/15 2:34 PM, Adrien Mogenet wrote:
Hi there,
We've spent several hours to
My bad, I realized my question was unclear.
I did a partitionBy when using saveAsHadoopFile. My question was about
doing the same thing for Parquet file. We were using Spark 1.3.x, but now
that we've updated to 1.4.1 I totally forgot this makes things possible :-)
Thanks for the answer, then!