For all three of these variables, you can use the ALTER SESSION or ALTER SYSTEM statements. See more here:
https://cwiki.apache.org/confluence/display/DRILL/SQL+Commands+Summary https://cwiki.apache.org/confluence/display/DRILL/Planning+and+Execution+Options example usage: ALTER SESSION `planner.slice_target` = 100000; On Mon, Sep 8, 2014 at 10:50 AM, Ted Dunning <[email protected]> wrote: > Where are these variables best modified? > > > > > On Mon, Sep 8, 2014 at 8:40 AM, Jacques Nadeau <[email protected]> wrote: > > > Drill's default behavior is to use estimates to determine the number of > > files that will be written. The equation is fairly complicated. > However, > > there are three key variables that will impact file splits. These are: > > > > planner.slice_target: targeted number of records to allow within a single > > slice before increasing parallelization (defaults to 1mm in 0.4, 100k in > > 0.5) > > planner.width.max_per_node: maximum number of slices run per node > (defaults > > to 0.7 * core count) > > store.parquet.block-size: largest allowed row group when generating > > Parquet files. (defaults to 512mb) > > > > If you are having more files than you would like, you can > > decrease planner.width.max_per_node to a smaller number. > > > > It's likely that Jim Scott's experience with a smaller number of files > was > > due to running on a machine with a smaller number of cores or the > optimizer > > estimating a smaller amount of data in the output. The behavior is data > > and machine dependent. > > > > thanks, > > Jacques > > > > > > On Mon, Sep 8, 2014 at 8:32 AM, Jim Scott <[email protected]> wrote: > > > > > I have created tables with Drill in parquet format and it created 2 > > files. > > > > > > > > > On Fri, Sep 5, 2014 at 3:46 PM, Jim <[email protected]> wrote: > > > > > > > > > > > Actually, it looks like it always breaks it into 6 pieces by default. > > Is > > > > there a way to make the partition size fixed rather than the number > of > > > > partitions? > > > > > > > > > > > > On 09/05/2014 04:40 PM, Jim wrote: > > > > > > > >> Hello all, > > > >> > > > >> I've been experimenting with drill to load data into Parquet files. > I > > > >> noticed rather large variability in the size of each parquet chunk. > Is > > > >> there a way to control this? > > > >> > > > >> The documentation seems a little sparse on configuring some of the > > finer > > > >> details. My apologies if I missed something obvious. > > > >> > > > >> Thanks > > > >> Jim > > > >> > > > >> > > > > > > > > > > > > > -- > > > *Jim Scott* > > > Director, Enterprise Strategy & Architecture > > > > > > <http://www.mapr.com/> > > > [image: MapR Technologies] <http://www.mapr.com> > > > > > >
