Re: No of files created by CTAS auto partition feature

Steven Phillips Wed, 26 Aug 2015 17:49:14 -0700

That's not really how it works. The only "spilling" to disk occurs during
External Sort, and the spill files are not created based on partition.


What makes you think it is spilling prematurely?

On Wed, Aug 26, 2015 at 5:15 PM, rahul challapalli <
[email protected]> wrote:

> Steven, Jason :
>
> Below is my understanding of when we should spill to disk while performing
> a sort. Let me know if I am missing anything
>
> alter session set `planner.width.max_per_node` = 4;
> alter session set `planner.memory.max_query_memory_per_node` = 8147483648;
> (~8GB)
> create table lineitem partition by (l_shipdate, l_receiptdate) as select *
> from dfs.`/drill/testdata/tpch100/lineitem`;
>
> 1. The above query creates 4 minor fragments and each minor fragment gets
> ~2GB for the sort phase.
> 2. Once a minor fragment cosumes ~2GB of memory, is starts spilling each
> partition into a separate file to disk
> 3. The spilled files would be of different sizes.
> 4. Now if it is a regular CTAS (with no partition by clause), each spilled
> file should be approximately ~2GB in size
>
> I just have a hunch that we are spilling a little early :)
>
> - Rahul
>
>
> On Wed, Aug 26, 2015 at 4:49 PM, rahul challapalli <
> [email protected]> wrote:
>
> > Jason,
> >
> > What you described is exactly my understanding.
> >
> > I did kickoff a run after setting `store.partition.hash_distribute`. It
> is
> > still running. I am expecting the no of files to be slightly more than or
> > equal to 75780. (As the default parquet block size should be sufficient
> for
> > most of the partitions)
> >
> > - Rahul
> >
> >
> >
> > On Wed, Aug 26, 2015 at 4:36 PM, Jason Altekruse <
> [email protected]
> > > wrote:
> >
> >> I feel like there is a little misunderstanding here.
> >>
> >> Rahul, did you try setting the option that Steven suggested?
> >> `store.partition.hash_distribute`
> >>
> >> This will cause a re-distribution of the data so that the rows that
> belong
> >> in a particular partition will all be written by a single writer. They
> >> will
> >> not necessarily be all in one file, as we have a limit on file sizes
> and I
> >> don't think we cap partition size.
> >>
> >> The default behavior is not to re-distribute, because it is expensive.
> >> This
> >> however means that every fragment will write out a file for whichever
> keys
> >> appear in the data that ends up at that fragment.
> >>
> >> If there is a large number of fragments and the data is spread out
> pretty
> >> randomly, then there is a reasonable case for turning on this option to
> >> co-locate data in a single partition to a single writer to reduce the
> >> number of smaller files. There is no magic formula for when it is best
> to
> >> turn on this option, but in most cases it will reduce the number of
> files
> >> produced.
> >>
> >>
> >>
> >> On Wed, Aug 26, 2015 at 3:48 PM, rahul challapalli <
> >> [email protected]> wrote:
> >>
> >> > Well this for generating some testdata
> >> >
> >> > - Rahul
> >> >
> >> > On Wed, Aug 26, 2015 at 3:47 PM, Andries Engelbrecht <
> >> > [email protected]> wrote:
> >> >
> >> > > Looks like Drill is doing the partitioning as requested then. May
> not
> >> be
> >> > > optimal though.
> >> > >
> >> > > Is there a reason why you want to subpartition this much? You may be
> >> > > better of to just partition by l_shipdate (not shipmate, autocorrect
> >> got
> >> > me
> >> > > there). Or use columns with much lower cardinality to test
> >> > subpartitioning.
> >> > >
> >> > > —Andries
> >> > >
> >> > >
> >> > > > On Aug 26, 2015, at 3:05 PM, rahul challapalli <
> >> > > [email protected]> wrote:
> >> > > >
> >> > > > Steven,
> >> > > >
> >> > > > You were right. The count is 606240 which is 8*75780.
> >> > > >
> >> > > >
> >> > > > Stefan & Andries,
> >> > > >
> >> > > > Below is the distinct count or cardinality
> >> > > >
> >> > > > select count(*) from (select l_shipdate, l_receiptdate from
> >> > > > dfs.`/drill/testdata/tpch100/
> >> > > > lineitem` group by l_shipdate, l_receiptdate) sub;
> >> > > > +---------+
> >> > > > | EXPR$0  |
> >> > > > +---------+
> >> > > > | 75780   |
> >> > > > +---------+
> >> > > >
> >> > > > - Rahul
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Wed, Aug 26, 2015 at 1:26 PM, Andries Engelbrecht <
> >> > > > [email protected]> wrote:
> >> > > >
> >> > > >> What is the distinct count for this columns? IIRC TPC-H has at
> >> least 5
> >> > > >> years of data irrespective of SF, so you are requesting a lot of
> >> > > >> partitions. 76K sounds about right for 5 years of TPCH shipmate
> and
> >> > > >> correlating receipt date data, your query doesn’t count the
> actual
> >> > > files.
> >> > > >>
> >> > > >> Try to partition just on the shipmate column first.
> >> > > >>
> >> > > >> —Andries
> >> > > >>
> >> > > >>
> >> > > >>> On Aug 26, 2015, at 12:34 PM, Stefán Baxter <
> >> > [email protected]
> >> > > >
> >> > > >> wrote:
> >> > > >>>
> >> > > >>> Hi,
> >> > > >>>
> >> > > >>> Is it possible that the combination values of  (l_shipdate,
> >> > > >>> l_receiptdate) have a very high cardinality?
> >> > > >>> I would think you are creating partition files for a small
> subset
> >> of
> >> > > the
> >> > > >>> data.
> >> > > >>>
> >> > > >>> Please keep in mind that I know nothing about TPCH SF100 and
> only
> >> a
> >> > > >> little
> >> > > >>> about Drill :).
> >> > > >>>
> >> > > >>> Regards,
> >> > > >>> -Stefan
> >> > > >>>
> >> > > >>> On Wed, Aug 26, 2015 at 7:30 PM, Steven Phillips <
> [email protected]>
> >> > > wrote:
> >> > > >>>
> >> > > >>>> It would be helpful if you could figure out what the file count
> >> is.
> >> > > But
> >> > > >>>> here are some thoughs:
> >> > > >>>>
> >> > > >>>> What is the value of the option:
> >> > > >>>> store.partition.hash_distribute
> >> > > >>>>
> >> > > >>>> If it is false, which it is by default, then every fragment
> will
> >> > > >>>> potentially have data in every partition. In this case, that
> >> could
> >> > > >> increase
> >> > > >>>> the number of files by a factor of 8.
> >> > > >>>>
> >> > > >>>> On Wed, Aug 26, 2015 at 12:21 PM, rahul challapalli <
> >> > > >>>> [email protected]> wrote:
> >> > > >>>>
> >> > > >>>>> Drillers,
> >> > > >>>>>
> >> > > >>>>> I executed the below query on TPCH SF100 with drill and it
> took
> >> > ~2hrs
> >> > > >> to
> >> > > >>>>> complete on a 2 node cluster.
> >> > > >>>>>
> >> > > >>>>> alter session set `planner.width.max_per_node` = 4;
> >> > > >>>>> alter session set `planner.memory.max_query_memory_per_node` =
> >> > > >>>> 8147483648;
> >> > > >>>>> create table lineitem partition by (l_shipdate, l_receiptdate)
> >> as
> >> > > >> select
> >> > > >>>> *
> >> > > >>>>> from dfs.`/drill/testdata/tpch100/lineitem`;
> >> > > >>>>>
> >> > > >>>>> The below query returned 75780, so I expected drill to create
> >> the
> >> > > same
> >> > > >> no
> >> > > >>>>> of files or may be a little more. But drill created so many
> >> files
> >> > > that
> >> > > >> a
> >> > > >>>>> "hadoop fs -count" command failed with a "GC overhead limit
> >> > > exceeded".
> >> > > >> (I
> >> > > >>>>> did not change the default parquet block size)
> >> > > >>>>>
> >> > > >>>>> select count(*) from (select l_shipdate, l_receiptdate from
> >> > > >>>>> dfs.`/drill/testdata/tpch100/lineitem` group by l_shipdate,
> >> > > >>>> l_receiptdate)
> >> > > >>>>> sub;
> >> > > >>>>> +---------+
> >> > > >>>>> | EXPR$0  |
> >> > > >>>>> +---------+
> >> > > >>>>> | 75780   |
> >> > > >>>>> +---------+
> >> > > >>>>>
> >> > > >>>>>
> >> > > >>>>> Any thoughts on why drill is creating so many files?
> >> > > >>>>>
> >> > > >>>>> - Rahul
> >> > > >>>>>
> >> > > >>>>
> >> > > >>
> >> > > >>
> >> > >
> >> > >
> >> >
> >>
> >
> >
>

Re: No of files created by CTAS auto partition feature

Reply via email to