RE: CSV to parquet preserving partitioning

2016-11-23 Thread benoitdr
dataframe to parquet table partitioned by dirs It requires to write his own parser. I could not find a solution to preserve the partitioning using sc.textfile or the databricks csv parser. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/CSV-to-parquet-preserving

RE: CSV to parquet preserving partitioning

2016-11-18 Thread benoitdr
the ETL phase. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/CSV-to-parquet-preserving-partitioning-tp28078p28103.html Sent from the Apache Spark User List mailing list archive at Nabble.com

RE: CSV to parquet preserving partitioning

2016-11-16 Thread neil90
- hdfs://path/dir=dir1/part-r-xxx.gz.parquet hdfs://path/dir=dir2/part-r-yyy.gz.parquet hdfs://path/dir=dir3/part-r-zzz.gz.parquet -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/CSV-to-parquet-preserving-partitioning-tp28078p28087.html Sent from the Apach

RE: CSV to parquet preserving partitioning

2016-11-16 Thread benoitdr
ct: Re: CSV to parquet preserving partitioning Is there anything in the files to let you know which directory they should be in? If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.c

Re: CSV to parquet preserving partitioning

2016-11-16 Thread neil90
Is there anything in the files to let you know which directory they should be in? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/CSV-to-parquet-preserving-partitioning-tp28078p28083.html Sent from the Apache Spark User List mailing list archive

RE: CSV to parquet preserving partitioning

2016-11-16 Thread Drooghaag, Benoit (Nokia - BE)
ogh...@nokia.com> Cc: user <user@spark.apache.org> Subject: Re: CSV to parquet preserving partitioning Did you try unioning the datasets for each CSV into a single dataset? You may need to put the directory name into a column so you can partition by it. On Tue, Nov 15, 2016 at 8:44

Re: CSV to parquet preserving partitioning

2016-11-15 Thread Daniel Siegmann
lelism (and without > shuffling the data in the cluster). > > Thanks ! > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/CSV-to-parquet-preserving-partitioning-tp28078.html > Sent from the Apache Spark User List mailin

CSV to parquet preserving partitioning

2016-11-15 Thread benoitdr
in context: http://apache-spark-user-list.1001560.n3.nabble.com/CSV-to-parquet-preserving-partitioning-tp28078.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: use