Re: Spark ignoring partition names without equals (=) separator

Prasanna Santhanam Mon, 28 Nov 2016 21:20:07 -0800

On Mon, Nov 28, 2016 at 4:39 PM, Steve Loughran <ste...@hortonworks.com>
wrote:


>
> irrespective of naming, know that deep directory trees are performance
> killers when listing files on s3 and setting up jobs. You might actually be
> better off having them in the same directory and using a pattern like
> 2016-03-11-*
> as the pattten to find files.
>

Thanks Bharat and Steve - I've generally followed the partitioned table
format over the flat structure since this aides WHERE clause filtering
(PredicatePushDown?). Wrt performance that helps the write once, query many
times kind of workloads. Changing this in our production application that
dumps these is cumbersome. Is there a configuration that would override
this restriction for Spark? Does it make sense to have one?

Re: Spark ignoring partition names without equals (=) separator

Reply via email to