> On 29 Jun 2017, at 17:44, fran wrote:
>
> We have got data stored in S3 partitioned by several columns. Let's say
> following this hierarchy:
> s3://bucket/data/column1=X/column2=Y/parquet-files
>
> We run a Spark job in a EMR cluster (1 master,3 slaves) and
We have got data stored in S3 partitioned by several columns. Let's say
following this hierarchy:
s3://bucket/data/column1=X/column2=Y/parquet-files
We run a Spark job in a EMR cluster (1 master,3 slaves) and realised the
following:
A) - When we declare the initial dataframe to be the whole
The above is for EMR 5.5.0, Hadoop 2.7.3 and Spark 2.1.0.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-querying-parquet-data-partitioned-in-S3-tp28809.html
Sent from the Apache Spark User List mailing list archive at