Re: Opening many Parquet files = slow

2015-04-08 Thread Prashant Kommireddi
We noticed similar perf degradation using Parquet (outside of Spark) and it happened due to merging of multiple schemas. Would be good to know if disabling merge of schema (if the schema is same) as Michael suggested helps in your case. On Wed, Apr 8, 2015 at 11:43 AM, Michael Armbrust

Job submission API

2015-04-07 Thread Prashant Kommireddi
Hello folks, Newbie here! Just had a quick question - is there a job submission API such as the one with hadoop https://hadoop.apache.org/docs/r2.3.0/api/org/apache/hadoop/mapreduce/Job.html#submit() to submit Spark jobs to a Yarn cluster? I see in example that bin/spark-submit is what's out