Re: Split content into multiple Parquet files

2015-09-08 Thread Cheng Lian
In Spark 1.4 and 1.5, you can do something like this: df.write.partitionBy("key").parquet("/datasink/output-parquets") BTW, I'm curious about how did you do it without partitionBy using saveAsHadoopFile? Cheng On 9/8/15 2:34 PM, Adrien Mogenet wrote: Hi there, We've spent several hours to

Re: Split content into multiple Parquet files

2015-09-08 Thread Adrien Mogenet
My bad, I realized my question was unclear. I did a partitionBy when using saveAsHadoopFile. My question was about doing the same thing for Parquet file. We were using Spark 1.3.x, but now that we've updated to 1.4.1 I totally forgot this makes things possible :-) Thanks for the answer, then!