Re: [Spark Core] excessive read/load times on parquet files in 2.2 vs 2.0

2017-09-11 Thread Matthew Anthony
any other feedback on this? On 9/8/17 11:00 AM, Neil Jonkers wrote: Can you provide a code sample please? On Fri, Sep 8, 2017 at 5:44 PM, Matthew Anthony <statm...@gmail.com <mailto:statm...@gmail.com>> wrote: Hi all - since upgrading to 2.2.0, we've noticed a

Re: [Spark Core] excessive read/load times on parquet files in 2.2 vs 2.0

2017-09-08 Thread Matthew Anthony
you provide a code sample please? On Fri, Sep 8, 2017 at 5:44 PM, Matthew Anthony <statm...@gmail.com <mailto:statm...@gmail.com>> wrote: Hi all - since upgrading to 2.2.0, we've noticed a significant increase in read.parquet(...) ops. The parquet files are being

[Spark Core] excessive read/load times on parquet files in 2.2 vs 2.0

2017-09-08 Thread Matthew Anthony
Hi all - since upgrading to 2.2.0, we've noticed a significant increase in read.parquet(...) ops. The parquet files are being read from S3. Upon entry at the interactive terminal (pyspark in this case), the terminal will sit "idle" for several minutes (as many as 10) before returning: