Where are these variables best modified?
On Mon, Sep 8, 2014 at 8:40 AM, Jacques Nadeau <[email protected]> wrote: > Drill's default behavior is to use estimates to determine the number of > files that will be written. The equation is fairly complicated. However, > there are three key variables that will impact file splits. These are: > > planner.slice_target: targeted number of records to allow within a single > slice before increasing parallelization (defaults to 1mm in 0.4, 100k in > 0.5) > planner.width.max_per_node: maximum number of slices run per node (defaults > to 0.7 * core count) > store.parquet.block-size: largest allowed row group when generating > Parquet files. (defaults to 512mb) > > If you are having more files than you would like, you can > decrease planner.width.max_per_node to a smaller number. > > It's likely that Jim Scott's experience with a smaller number of files was > due to running on a machine with a smaller number of cores or the optimizer > estimating a smaller amount of data in the output. The behavior is data > and machine dependent. > > thanks, > Jacques > > > On Mon, Sep 8, 2014 at 8:32 AM, Jim Scott <[email protected]> wrote: > > > I have created tables with Drill in parquet format and it created 2 > files. > > > > > > On Fri, Sep 5, 2014 at 3:46 PM, Jim <[email protected]> wrote: > > > > > > > > Actually, it looks like it always breaks it into 6 pieces by default. > Is > > > there a way to make the partition size fixed rather than the number of > > > partitions? > > > > > > > > > On 09/05/2014 04:40 PM, Jim wrote: > > > > > >> Hello all, > > >> > > >> I've been experimenting with drill to load data into Parquet files. I > > >> noticed rather large variability in the size of each parquet chunk. Is > > >> there a way to control this? > > >> > > >> The documentation seems a little sparse on configuring some of the > finer > > >> details. My apologies if I missed something obvious. > > >> > > >> Thanks > > >> Jim > > >> > > >> > > > > > > > > > -- > > *Jim Scott* > > Director, Enterprise Strategy & Architecture > > > > <http://www.mapr.com/> > > [image: MapR Technologies] <http://www.mapr.com> > > >
