Re: Parquet file partition size

Jacques Nadeau Mon, 08 Sep 2014 11:02:29 -0700

For all three of these variables, you can use the ALTER SESSION or ALTER
SYSTEM statements.  See more here:


https://cwiki.apache.org/confluence/display/DRILL/SQL+Commands+Summary
https://cwiki.apache.org/confluence/display/DRILL/Planning+and+Execution+Options

example usage:

ALTER SESSION `planner.slice_target` = 100000;



On Mon, Sep 8, 2014 at 10:50 AM, Ted Dunning <[email protected]> wrote:

> Where are these variables best modified?
>
>
>
>
> On Mon, Sep 8, 2014 at 8:40 AM, Jacques Nadeau <[email protected]> wrote:
>
> > Drill's default behavior is to use estimates to determine the number of
> > files that will be written.  The equation is fairly complicated.
> However,
> > there are three key variables that will impact file splits.  These are:
> >
> > planner.slice_target: targeted number of records to allow within a single
> > slice before increasing parallelization (defaults to 1mm in 0.4, 100k in
> > 0.5)
> > planner.width.max_per_node: maximum number of slices run per node
> (defaults
> > to 0.7 * core count)
> > store.parquet.block-size:   largest allowed row group when generating
> > Parquet files.  (defaults to 512mb)
> >
> > If you are having more files than you would like, you can
> > decrease planner.width.max_per_node to a smaller number.
> >
> > It's likely that Jim Scott's experience with a smaller number of files
> was
> > due to running on a machine with a smaller number of cores or the
> optimizer
> > estimating a smaller amount of data in the output.  The behavior is data
> > and machine dependent.
> >
> > thanks,
> > Jacques
> >
> >
> > On Mon, Sep 8, 2014 at 8:32 AM, Jim Scott <[email protected]> wrote:
> >
> > > I have created tables with Drill in parquet format and it created 2
> > files.
> > >
> > >
> > > On Fri, Sep 5, 2014 at 3:46 PM, Jim <[email protected]> wrote:
> > >
> > > >
> > > > Actually, it looks like it always breaks it into 6 pieces by default.
> > Is
> > > > there a way to make the partition size fixed rather than the number
> of
> > > > partitions?
> > > >
> > > >
> > > > On 09/05/2014 04:40 PM, Jim wrote:
> > > >
> > > >> Hello all,
> > > >>
> > > >> I've been experimenting with drill to load data into Parquet files.
> I
> > > >> noticed rather large variability in the size of each parquet chunk.
> Is
> > > >> there a way to control this?
> > > >>
> > > >> The documentation seems a little sparse on configuring some of the
> > finer
> > > >> details. My apologies if I missed something obvious.
> > > >>
> > > >> Thanks
> > > >> Jim
> > > >>
> > > >>
> > > >
> > >
> > >
> > > --
> > > *Jim Scott*
> > > Director, Enterprise Strategy & Architecture
> > >
> > >  <http://www.mapr.com/>
> > > [image: MapR Technologies] <http://www.mapr.com>
> > >
> >
>

Re: Parquet file partition size

Reply via email to