Re: No of files created by CTAS auto partition feature

rahul challapalli Wed, 26 Aug 2015 17:55:04 -0700

Thanks for clarifying.

In the spill directory under a specific minor fragment, I saw files with
different sizes. A few files are just 50MB each and a few files are 1.5 GB.
Its not clear to me as to why the file sizes are so different.


- Rahul

On Wed, Aug 26, 2015 at 5:48 PM, Steven Phillips <[email protected]> wrote:

> That's not really how it works. The only "spilling" to disk occurs during
> External Sort, and the spill files are not created based on partition.
>
> What makes you think it is spilling prematurely?
>
> On Wed, Aug 26, 2015 at 5:15 PM, rahul challapalli <
> [email protected]> wrote:
>
> > Steven, Jason :
> >
> > Below is my understanding of when we should spill to disk while
> performing
> > a sort. Let me know if I am missing anything
> >
> > alter session set `planner.width.max_per_node` = 4;
> > alter session set `planner.memory.max_query_memory_per_node` =
> 8147483648;
> > (~8GB)
> > create table lineitem partition by (l_shipdate, l_receiptdate) as select
> *
> > from dfs.`/drill/testdata/tpch100/lineitem`;
> >
> > 1. The above query creates 4 minor fragments and each minor fragment gets
> > ~2GB for the sort phase.
> > 2. Once a minor fragment cosumes ~2GB of memory, is starts spilling each
> > partition into a separate file to disk
> > 3. The spilled files would be of different sizes.
> > 4. Now if it is a regular CTAS (with no partition by clause), each
> spilled
> > file should be approximately ~2GB in size
> >
> > I just have a hunch that we are spilling a little early :)
> >
> > - Rahul
> >
> >
> > On Wed, Aug 26, 2015 at 4:49 PM, rahul challapalli <
> > [email protected]> wrote:
> >
> > > Jason,
> > >
> > > What you described is exactly my understanding.
> > >
> > > I did kickoff a run after setting `store.partition.hash_distribute`. It
> > is
> > > still running. I am expecting the no of files to be slightly more than
> or
> > > equal to 75780. (As the default parquet block size should be sufficient
> > for
> > > most of the partitions)
> > >
> > > - Rahul
> > >
> > >
> > >
> > > On Wed, Aug 26, 2015 at 4:36 PM, Jason Altekruse <
> > [email protected]
> > > > wrote:
> > >
> > >> I feel like there is a little misunderstanding here.
> > >>
> > >> Rahul, did you try setting the option that Steven suggested?
> > >> `store.partition.hash_distribute`
> > >>
> > >> This will cause a re-distribution of the data so that the rows that
> > belong
> > >> in a particular partition will all be written by a single writer. They
> > >> will
> > >> not necessarily be all in one file, as we have a limit on file sizes
> > and I
> > >> don't think we cap partition size.
> > >>
> > >> The default behavior is not to re-distribute, because it is expensive.
> > >> This
> > >> however means that every fragment will write out a file for whichever
> > keys
> > >> appear in the data that ends up at that fragment.
> > >>
> > >> If there is a large number of fragments and the data is spread out
> > pretty
> > >> randomly, then there is a reasonable case for turning on this option
> to
> > >> co-locate data in a single partition to a single writer to reduce the
> > >> number of smaller files. There is no magic formula for when it is best
> > to
> > >> turn on this option, but in most cases it will reduce the number of
> > files
> > >> produced.
> > >>
> > >>
> > >>
> > >> On Wed, Aug 26, 2015 at 3:48 PM, rahul challapalli <
> > >> [email protected]> wrote:
> > >>
> > >> > Well this for generating some testdata
> > >> >
> > >> > - Rahul
> > >> >
> > >> > On Wed, Aug 26, 2015 at 3:47 PM, Andries Engelbrecht <
> > >> > [email protected]> wrote:
> > >> >
> > >> > > Looks like Drill is doing the partitioning as requested then. May
> > not
> > >> be
> > >> > > optimal though.
> > >> > >
> > >> > > Is there a reason why you want to subpartition this much? You may
> be
> > >> > > better of to just partition by l_shipdate (not shipmate,
> autocorrect
> > >> got
> > >> > me
> > >> > > there). Or use columns with much lower cardinality to test
> > >> > subpartitioning.
> > >> > >
> > >> > > —Andries
> > >> > >
> > >> > >
> > >> > > > On Aug 26, 2015, at 3:05 PM, rahul challapalli <
> > >> > > [email protected]> wrote:
> > >> > > >
> > >> > > > Steven,
> > >> > > >
> > >> > > > You were right. The count is 606240 which is 8*75780.
> > >> > > >
> > >> > > >
> > >> > > > Stefan & Andries,
> > >> > > >
> > >> > > > Below is the distinct count or cardinality
> > >> > > >
> > >> > > > select count(*) from (select l_shipdate, l_receiptdate from
> > >> > > > dfs.`/drill/testdata/tpch100/
> > >> > > > lineitem` group by l_shipdate, l_receiptdate) sub;
> > >> > > > +---------+
> > >> > > > | EXPR$0  |
> > >> > > > +---------+
> > >> > > > | 75780   |
> > >> > > > +---------+
> > >> > > >
> > >> > > > - Rahul
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > On Wed, Aug 26, 2015 at 1:26 PM, Andries Engelbrecht <
> > >> > > > [email protected]> wrote:
> > >> > > >
> > >> > > >> What is the distinct count for this columns? IIRC TPC-H has at
> > >> least 5
> > >> > > >> years of data irrespective of SF, so you are requesting a lot
> of
> > >> > > >> partitions. 76K sounds about right for 5 years of TPCH shipmate
> > and
> > >> > > >> correlating receipt date data, your query doesn’t count the
> > actual
> > >> > > files.
> > >> > > >>
> > >> > > >> Try to partition just on the shipmate column first.
> > >> > > >>
> > >> > > >> —Andries
> > >> > > >>
> > >> > > >>
> > >> > > >>> On Aug 26, 2015, at 12:34 PM, Stefán Baxter <
> > >> > [email protected]
> > >> > > >
> > >> > > >> wrote:
> > >> > > >>>
> > >> > > >>> Hi,
> > >> > > >>>
> > >> > > >>> Is it possible that the combination values of  (l_shipdate,
> > >> > > >>> l_receiptdate) have a very high cardinality?
> > >> > > >>> I would think you are creating partition files for a small
> > subset
> > >> of
> > >> > > the
> > >> > > >>> data.
> > >> > > >>>
> > >> > > >>> Please keep in mind that I know nothing about TPCH SF100 and
> > only
> > >> a
> > >> > > >> little
> > >> > > >>> about Drill :).
> > >> > > >>>
> > >> > > >>> Regards,
> > >> > > >>> -Stefan
> > >> > > >>>
> > >> > > >>> On Wed, Aug 26, 2015 at 7:30 PM, Steven Phillips <
> > [email protected]>
> > >> > > wrote:
> > >> > > >>>
> > >> > > >>>> It would be helpful if you could figure out what the file
> count
> > >> is.
> > >> > > But
> > >> > > >>>> here are some thoughs:
> > >> > > >>>>
> > >> > > >>>> What is the value of the option:
> > >> > > >>>> store.partition.hash_distribute
> > >> > > >>>>
> > >> > > >>>> If it is false, which it is by default, then every fragment
> > will
> > >> > > >>>> potentially have data in every partition. In this case, that
> > >> could
> > >> > > >> increase
> > >> > > >>>> the number of files by a factor of 8.
> > >> > > >>>>
> > >> > > >>>> On Wed, Aug 26, 2015 at 12:21 PM, rahul challapalli <
> > >> > > >>>> [email protected]> wrote:
> > >> > > >>>>
> > >> > > >>>>> Drillers,
> > >> > > >>>>>
> > >> > > >>>>> I executed the below query on TPCH SF100 with drill and it
> > took
> > >> > ~2hrs
> > >> > > >> to
> > >> > > >>>>> complete on a 2 node cluster.
> > >> > > >>>>>
> > >> > > >>>>> alter session set `planner.width.max_per_node` = 4;
> > >> > > >>>>> alter session set
> `planner.memory.max_query_memory_per_node` =
> > >> > > >>>> 8147483648;
> > >> > > >>>>> create table lineitem partition by (l_shipdate,
> l_receiptdate)
> > >> as
> > >> > > >> select
> > >> > > >>>> *
> > >> > > >>>>> from dfs.`/drill/testdata/tpch100/lineitem`;
> > >> > > >>>>>
> > >> > > >>>>> The below query returned 75780, so I expected drill to
> create
> > >> the
> > >> > > same
> > >> > > >> no
> > >> > > >>>>> of files or may be a little more. But drill created so many
> > >> files
> > >> > > that
> > >> > > >> a
> > >> > > >>>>> "hadoop fs -count" command failed with a "GC overhead limit
> > >> > > exceeded".
> > >> > > >> (I
> > >> > > >>>>> did not change the default parquet block size)
> > >> > > >>>>>
> > >> > > >>>>> select count(*) from (select l_shipdate, l_receiptdate from
> > >> > > >>>>> dfs.`/drill/testdata/tpch100/lineitem` group by l_shipdate,
> > >> > > >>>> l_receiptdate)
> > >> > > >>>>> sub;
> > >> > > >>>>> +---------+
> > >> > > >>>>> | EXPR$0  |
> > >> > > >>>>> +---------+
> > >> > > >>>>> | 75780   |
> > >> > > >>>>> +---------+
> > >> > > >>>>>
> > >> > > >>>>>
> > >> > > >>>>> Any thoughts on why drill is creating so many files?
> > >> > > >>>>>
> > >> > > >>>>> - Rahul
> > >> > > >>>>>
> > >> > > >>>>
> > >> > > >>
> > >> > > >>
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: No of files created by CTAS auto partition feature

Reply via email to