Cool.
On Mon, Sep 8, 2014 at 11:01 AM, Jacques Nadeau <[email protected]> wrote: > For all three of these variables, you can use the ALTER SESSION or ALTER > SYSTEM statements. See more here: > > https://cwiki.apache.org/confluence/display/DRILL/SQL+Commands+Summary > > https://cwiki.apache.org/confluence/display/DRILL/Planning+and+Execution+Options > > example usage: > > ALTER SESSION `planner.slice_target` = 100000; > > > > On Mon, Sep 8, 2014 at 10:50 AM, Ted Dunning <[email protected]> > wrote: > > > Where are these variables best modified? > > > > > > > > > > On Mon, Sep 8, 2014 at 8:40 AM, Jacques Nadeau <[email protected]> > wrote: > > > > > Drill's default behavior is to use estimates to determine the number of > > > files that will be written. The equation is fairly complicated. > > However, > > > there are three key variables that will impact file splits. These are: > > > > > > planner.slice_target: targeted number of records to allow within a > single > > > slice before increasing parallelization (defaults to 1mm in 0.4, 100k > in > > > 0.5) > > > planner.width.max_per_node: maximum number of slices run per node > > (defaults > > > to 0.7 * core count) > > > store.parquet.block-size: largest allowed row group when generating > > > Parquet files. (defaults to 512mb) > > > > > > If you are having more files than you would like, you can > > > decrease planner.width.max_per_node to a smaller number. > > > > > > It's likely that Jim Scott's experience with a smaller number of files > > was > > > due to running on a machine with a smaller number of cores or the > > optimizer > > > estimating a smaller amount of data in the output. The behavior is > data > > > and machine dependent. > > > > > > thanks, > > > Jacques > > > > > > > > > On Mon, Sep 8, 2014 at 8:32 AM, Jim Scott <[email protected]> wrote: > > > > > > > I have created tables with Drill in parquet format and it created 2 > > > files. > > > > > > > > > > > > On Fri, Sep 5, 2014 at 3:46 PM, Jim <[email protected]> wrote: > > > > > > > > > > > > > > Actually, it looks like it always breaks it into 6 pieces by > default. > > > Is > > > > > there a way to make the partition size fixed rather than the number > > of > > > > > partitions? > > > > > > > > > > > > > > > On 09/05/2014 04:40 PM, Jim wrote: > > > > > > > > > >> Hello all, > > > > >> > > > > >> I've been experimenting with drill to load data into Parquet > files. > > I > > > > >> noticed rather large variability in the size of each parquet > chunk. > > Is > > > > >> there a way to control this? > > > > >> > > > > >> The documentation seems a little sparse on configuring some of the > > > finer > > > > >> details. My apologies if I missed something obvious. > > > > >> > > > > >> Thanks > > > > >> Jim > > > > >> > > > > >> > > > > > > > > > > > > > > > > > -- > > > > *Jim Scott* > > > > Director, Enterprise Strategy & Architecture > > > > > > > > <http://www.mapr.com/> > > > > [image: MapR Technologies] <http://www.mapr.com> > > > > > > > > > >
