dataframe to parquet table partitioned by dirs
It requires to write his own parser. I could not find a solution to preserve
the partitioning using sc.textfile or the databricks csv parser.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/CSV-to-parquet-preserving
the ETL phase.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/CSV-to-parquet-preserving-partitioning-tp28078p28103.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
-
hdfs://path/dir=dir1/part-r-xxx.gz.parquet
hdfs://path/dir=dir2/part-r-yyy.gz.parquet
hdfs://path/dir=dir3/part-r-zzz.gz.parquet
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/CSV-to-parquet-preserving-partitioning-tp28078p28087.html
Sent from the Apach
ct: Re: CSV to parquet preserving partitioning
Is there anything in the files to let you know which directory they should be
in?
If you reply to this email, your message will be added to the discussion below:
http://apache-spark-user-list.1001560.n3.nabble.c
Is there anything in the files to let you know which directory they should be
in?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/CSV-to-parquet-preserving-partitioning-tp28078p28083.html
Sent from the Apache Spark User List mailing list archive
ogh...@nokia.com>
Cc: user <user@spark.apache.org>
Subject: Re: CSV to parquet preserving partitioning
Did you try unioning the datasets for each CSV into a single dataset? You may
need to put the directory name into a column so you can partition by it.
On Tue, Nov 15, 2016 at 8:44
lelism (and without
> shuffling the data in the cluster).
>
> Thanks !
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/CSV-to-parquet-preserving-partitioning-tp28078.html
> Sent from the Apache Spark User List mailin
in context:
http://apache-spark-user-list.1001560.n3.nabble.com/CSV-to-parquet-preserving-partitioning-tp28078.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe e-mail: use