Steven, Jason :

Below is my understanding of when we should spill to disk while performing
a sort. Let me know if I am missing anything

alter session set `planner.width.max_per_node` = 4;
alter session set `planner.memory.max_query_memory_per_node` = 8147483648;
(~8GB)
create table lineitem partition by (l_shipdate, l_receiptdate) as select *
from dfs.`/drill/testdata/tpch100/lineitem`;

1. The above query creates 4 minor fragments and each minor fragment gets
~2GB for the sort phase.
2. Once a minor fragment cosumes ~2GB of memory, is starts spilling each
partition into a separate file to disk
3. The spilled files would be of different sizes.
4. Now if it is a regular CTAS (with no partition by clause), each spilled
file should be approximately ~2GB in size

I just have a hunch that we are spilling a little early :)

- Rahul


On Wed, Aug 26, 2015 at 4:49 PM, rahul challapalli <
[email protected]> wrote:

> Jason,
>
> What you described is exactly my understanding.
>
> I did kickoff a run after setting `store.partition.hash_distribute`. It is
> still running. I am expecting the no of files to be slightly more than or
> equal to 75780. (As the default parquet block size should be sufficient for
> most of the partitions)
>
> - Rahul
>
>
>
> On Wed, Aug 26, 2015 at 4:36 PM, Jason Altekruse <[email protected]
> > wrote:
>
>> I feel like there is a little misunderstanding here.
>>
>> Rahul, did you try setting the option that Steven suggested?
>> `store.partition.hash_distribute`
>>
>> This will cause a re-distribution of the data so that the rows that belong
>> in a particular partition will all be written by a single writer. They
>> will
>> not necessarily be all in one file, as we have a limit on file sizes and I
>> don't think we cap partition size.
>>
>> The default behavior is not to re-distribute, because it is expensive.
>> This
>> however means that every fragment will write out a file for whichever keys
>> appear in the data that ends up at that fragment.
>>
>> If there is a large number of fragments and the data is spread out pretty
>> randomly, then there is a reasonable case for turning on this option to
>> co-locate data in a single partition to a single writer to reduce the
>> number of smaller files. There is no magic formula for when it is best to
>> turn on this option, but in most cases it will reduce the number of files
>> produced.
>>
>>
>>
>> On Wed, Aug 26, 2015 at 3:48 PM, rahul challapalli <
>> [email protected]> wrote:
>>
>> > Well this for generating some testdata
>> >
>> > - Rahul
>> >
>> > On Wed, Aug 26, 2015 at 3:47 PM, Andries Engelbrecht <
>> > [email protected]> wrote:
>> >
>> > > Looks like Drill is doing the partitioning as requested then. May not
>> be
>> > > optimal though.
>> > >
>> > > Is there a reason why you want to subpartition this much? You may be
>> > > better of to just partition by l_shipdate (not shipmate, autocorrect
>> got
>> > me
>> > > there). Or use columns with much lower cardinality to test
>> > subpartitioning.
>> > >
>> > > —Andries
>> > >
>> > >
>> > > > On Aug 26, 2015, at 3:05 PM, rahul challapalli <
>> > > [email protected]> wrote:
>> > > >
>> > > > Steven,
>> > > >
>> > > > You were right. The count is 606240 which is 8*75780.
>> > > >
>> > > >
>> > > > Stefan & Andries,
>> > > >
>> > > > Below is the distinct count or cardinality
>> > > >
>> > > > select count(*) from (select l_shipdate, l_receiptdate from
>> > > > dfs.`/drill/testdata/tpch100/
>> > > > lineitem` group by l_shipdate, l_receiptdate) sub;
>> > > > +---------+
>> > > > | EXPR$0  |
>> > > > +---------+
>> > > > | 75780   |
>> > > > +---------+
>> > > >
>> > > > - Rahul
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > On Wed, Aug 26, 2015 at 1:26 PM, Andries Engelbrecht <
>> > > > [email protected]> wrote:
>> > > >
>> > > >> What is the distinct count for this columns? IIRC TPC-H has at
>> least 5
>> > > >> years of data irrespective of SF, so you are requesting a lot of
>> > > >> partitions. 76K sounds about right for 5 years of TPCH shipmate and
>> > > >> correlating receipt date data, your query doesn’t count the actual
>> > > files.
>> > > >>
>> > > >> Try to partition just on the shipmate column first.
>> > > >>
>> > > >> —Andries
>> > > >>
>> > > >>
>> > > >>> On Aug 26, 2015, at 12:34 PM, Stefán Baxter <
>> > [email protected]
>> > > >
>> > > >> wrote:
>> > > >>>
>> > > >>> Hi,
>> > > >>>
>> > > >>> Is it possible that the combination values of  (l_shipdate,
>> > > >>> l_receiptdate) have a very high cardinality?
>> > > >>> I would think you are creating partition files for a small subset
>> of
>> > > the
>> > > >>> data.
>> > > >>>
>> > > >>> Please keep in mind that I know nothing about TPCH SF100 and only
>> a
>> > > >> little
>> > > >>> about Drill :).
>> > > >>>
>> > > >>> Regards,
>> > > >>> -Stefan
>> > > >>>
>> > > >>> On Wed, Aug 26, 2015 at 7:30 PM, Steven Phillips <[email protected]>
>> > > wrote:
>> > > >>>
>> > > >>>> It would be helpful if you could figure out what the file count
>> is.
>> > > But
>> > > >>>> here are some thoughs:
>> > > >>>>
>> > > >>>> What is the value of the option:
>> > > >>>> store.partition.hash_distribute
>> > > >>>>
>> > > >>>> If it is false, which it is by default, then every fragment will
>> > > >>>> potentially have data in every partition. In this case, that
>> could
>> > > >> increase
>> > > >>>> the number of files by a factor of 8.
>> > > >>>>
>> > > >>>> On Wed, Aug 26, 2015 at 12:21 PM, rahul challapalli <
>> > > >>>> [email protected]> wrote:
>> > > >>>>
>> > > >>>>> Drillers,
>> > > >>>>>
>> > > >>>>> I executed the below query on TPCH SF100 with drill and it took
>> > ~2hrs
>> > > >> to
>> > > >>>>> complete on a 2 node cluster.
>> > > >>>>>
>> > > >>>>> alter session set `planner.width.max_per_node` = 4;
>> > > >>>>> alter session set `planner.memory.max_query_memory_per_node` =
>> > > >>>> 8147483648;
>> > > >>>>> create table lineitem partition by (l_shipdate, l_receiptdate)
>> as
>> > > >> select
>> > > >>>> *
>> > > >>>>> from dfs.`/drill/testdata/tpch100/lineitem`;
>> > > >>>>>
>> > > >>>>> The below query returned 75780, so I expected drill to create
>> the
>> > > same
>> > > >> no
>> > > >>>>> of files or may be a little more. But drill created so many
>> files
>> > > that
>> > > >> a
>> > > >>>>> "hadoop fs -count" command failed with a "GC overhead limit
>> > > exceeded".
>> > > >> (I
>> > > >>>>> did not change the default parquet block size)
>> > > >>>>>
>> > > >>>>> select count(*) from (select l_shipdate, l_receiptdate from
>> > > >>>>> dfs.`/drill/testdata/tpch100/lineitem` group by l_shipdate,
>> > > >>>> l_receiptdate)
>> > > >>>>> sub;
>> > > >>>>> +---------+
>> > > >>>>> | EXPR$0  |
>> > > >>>>> +---------+
>> > > >>>>> | 75780   |
>> > > >>>>> +---------+
>> > > >>>>>
>> > > >>>>>
>> > > >>>>> Any thoughts on why drill is creating so many files?
>> > > >>>>>
>> > > >>>>> - Rahul
>> > > >>>>>
>> > > >>>>
>> > > >>
>> > > >>
>> > >
>> > >
>> >
>>
>
>

Reply via email to