Thanks for clarifying. In the spill directory under a specific minor fragment, I saw files with different sizes. A few files are just 50MB each and a few files are 1.5 GB. Its not clear to me as to why the file sizes are so different.
- Rahul On Wed, Aug 26, 2015 at 5:48 PM, Steven Phillips <[email protected]> wrote: > That's not really how it works. The only "spilling" to disk occurs during > External Sort, and the spill files are not created based on partition. > > What makes you think it is spilling prematurely? > > On Wed, Aug 26, 2015 at 5:15 PM, rahul challapalli < > [email protected]> wrote: > > > Steven, Jason : > > > > Below is my understanding of when we should spill to disk while > performing > > a sort. Let me know if I am missing anything > > > > alter session set `planner.width.max_per_node` = 4; > > alter session set `planner.memory.max_query_memory_per_node` = > 8147483648; > > (~8GB) > > create table lineitem partition by (l_shipdate, l_receiptdate) as select > * > > from dfs.`/drill/testdata/tpch100/lineitem`; > > > > 1. The above query creates 4 minor fragments and each minor fragment gets > > ~2GB for the sort phase. > > 2. Once a minor fragment cosumes ~2GB of memory, is starts spilling each > > partition into a separate file to disk > > 3. The spilled files would be of different sizes. > > 4. Now if it is a regular CTAS (with no partition by clause), each > spilled > > file should be approximately ~2GB in size > > > > I just have a hunch that we are spilling a little early :) > > > > - Rahul > > > > > > On Wed, Aug 26, 2015 at 4:49 PM, rahul challapalli < > > [email protected]> wrote: > > > > > Jason, > > > > > > What you described is exactly my understanding. > > > > > > I did kickoff a run after setting `store.partition.hash_distribute`. It > > is > > > still running. I am expecting the no of files to be slightly more than > or > > > equal to 75780. (As the default parquet block size should be sufficient > > for > > > most of the partitions) > > > > > > - Rahul > > > > > > > > > > > > On Wed, Aug 26, 2015 at 4:36 PM, Jason Altekruse < > > [email protected] > > > > wrote: > > > > > >> I feel like there is a little misunderstanding here. > > >> > > >> Rahul, did you try setting the option that Steven suggested? > > >> `store.partition.hash_distribute` > > >> > > >> This will cause a re-distribution of the data so that the rows that > > belong > > >> in a particular partition will all be written by a single writer. They > > >> will > > >> not necessarily be all in one file, as we have a limit on file sizes > > and I > > >> don't think we cap partition size. > > >> > > >> The default behavior is not to re-distribute, because it is expensive. > > >> This > > >> however means that every fragment will write out a file for whichever > > keys > > >> appear in the data that ends up at that fragment. > > >> > > >> If there is a large number of fragments and the data is spread out > > pretty > > >> randomly, then there is a reasonable case for turning on this option > to > > >> co-locate data in a single partition to a single writer to reduce the > > >> number of smaller files. There is no magic formula for when it is best > > to > > >> turn on this option, but in most cases it will reduce the number of > > files > > >> produced. > > >> > > >> > > >> > > >> On Wed, Aug 26, 2015 at 3:48 PM, rahul challapalli < > > >> [email protected]> wrote: > > >> > > >> > Well this for generating some testdata > > >> > > > >> > - Rahul > > >> > > > >> > On Wed, Aug 26, 2015 at 3:47 PM, Andries Engelbrecht < > > >> > [email protected]> wrote: > > >> > > > >> > > Looks like Drill is doing the partitioning as requested then. May > > not > > >> be > > >> > > optimal though. > > >> > > > > >> > > Is there a reason why you want to subpartition this much? You may > be > > >> > > better of to just partition by l_shipdate (not shipmate, > autocorrect > > >> got > > >> > me > > >> > > there). Or use columns with much lower cardinality to test > > >> > subpartitioning. > > >> > > > > >> > > —Andries > > >> > > > > >> > > > > >> > > > On Aug 26, 2015, at 3:05 PM, rahul challapalli < > > >> > > [email protected]> wrote: > > >> > > > > > >> > > > Steven, > > >> > > > > > >> > > > You were right. The count is 606240 which is 8*75780. > > >> > > > > > >> > > > > > >> > > > Stefan & Andries, > > >> > > > > > >> > > > Below is the distinct count or cardinality > > >> > > > > > >> > > > select count(*) from (select l_shipdate, l_receiptdate from > > >> > > > dfs.`/drill/testdata/tpch100/ > > >> > > > lineitem` group by l_shipdate, l_receiptdate) sub; > > >> > > > +---------+ > > >> > > > | EXPR$0 | > > >> > > > +---------+ > > >> > > > | 75780 | > > >> > > > +---------+ > > >> > > > > > >> > > > - Rahul > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > On Wed, Aug 26, 2015 at 1:26 PM, Andries Engelbrecht < > > >> > > > [email protected]> wrote: > > >> > > > > > >> > > >> What is the distinct count for this columns? IIRC TPC-H has at > > >> least 5 > > >> > > >> years of data irrespective of SF, so you are requesting a lot > of > > >> > > >> partitions. 76K sounds about right for 5 years of TPCH shipmate > > and > > >> > > >> correlating receipt date data, your query doesn’t count the > > actual > > >> > > files. > > >> > > >> > > >> > > >> Try to partition just on the shipmate column first. > > >> > > >> > > >> > > >> —Andries > > >> > > >> > > >> > > >> > > >> > > >>> On Aug 26, 2015, at 12:34 PM, Stefán Baxter < > > >> > [email protected] > > >> > > > > > >> > > >> wrote: > > >> > > >>> > > >> > > >>> Hi, > > >> > > >>> > > >> > > >>> Is it possible that the combination values of (l_shipdate, > > >> > > >>> l_receiptdate) have a very high cardinality? > > >> > > >>> I would think you are creating partition files for a small > > subset > > >> of > > >> > > the > > >> > > >>> data. > > >> > > >>> > > >> > > >>> Please keep in mind that I know nothing about TPCH SF100 and > > only > > >> a > > >> > > >> little > > >> > > >>> about Drill :). > > >> > > >>> > > >> > > >>> Regards, > > >> > > >>> -Stefan > > >> > > >>> > > >> > > >>> On Wed, Aug 26, 2015 at 7:30 PM, Steven Phillips < > > [email protected]> > > >> > > wrote: > > >> > > >>> > > >> > > >>>> It would be helpful if you could figure out what the file > count > > >> is. > > >> > > But > > >> > > >>>> here are some thoughs: > > >> > > >>>> > > >> > > >>>> What is the value of the option: > > >> > > >>>> store.partition.hash_distribute > > >> > > >>>> > > >> > > >>>> If it is false, which it is by default, then every fragment > > will > > >> > > >>>> potentially have data in every partition. In this case, that > > >> could > > >> > > >> increase > > >> > > >>>> the number of files by a factor of 8. > > >> > > >>>> > > >> > > >>>> On Wed, Aug 26, 2015 at 12:21 PM, rahul challapalli < > > >> > > >>>> [email protected]> wrote: > > >> > > >>>> > > >> > > >>>>> Drillers, > > >> > > >>>>> > > >> > > >>>>> I executed the below query on TPCH SF100 with drill and it > > took > > >> > ~2hrs > > >> > > >> to > > >> > > >>>>> complete on a 2 node cluster. > > >> > > >>>>> > > >> > > >>>>> alter session set `planner.width.max_per_node` = 4; > > >> > > >>>>> alter session set > `planner.memory.max_query_memory_per_node` = > > >> > > >>>> 8147483648; > > >> > > >>>>> create table lineitem partition by (l_shipdate, > l_receiptdate) > > >> as > > >> > > >> select > > >> > > >>>> * > > >> > > >>>>> from dfs.`/drill/testdata/tpch100/lineitem`; > > >> > > >>>>> > > >> > > >>>>> The below query returned 75780, so I expected drill to > create > > >> the > > >> > > same > > >> > > >> no > > >> > > >>>>> of files or may be a little more. But drill created so many > > >> files > > >> > > that > > >> > > >> a > > >> > > >>>>> "hadoop fs -count" command failed with a "GC overhead limit > > >> > > exceeded". > > >> > > >> (I > > >> > > >>>>> did not change the default parquet block size) > > >> > > >>>>> > > >> > > >>>>> select count(*) from (select l_shipdate, l_receiptdate from > > >> > > >>>>> dfs.`/drill/testdata/tpch100/lineitem` group by l_shipdate, > > >> > > >>>> l_receiptdate) > > >> > > >>>>> sub; > > >> > > >>>>> +---------+ > > >> > > >>>>> | EXPR$0 | > > >> > > >>>>> +---------+ > > >> > > >>>>> | 75780 | > > >> > > >>>>> +---------+ > > >> > > >>>>> > > >> > > >>>>> > > >> > > >>>>> Any thoughts on why drill is creating so many files? > > >> > > >>>>> > > >> > > >>>>> - Rahul > > >> > > >>>>> > > >> > > >>>> > > >> > > >> > > >> > > >> > > >> > > > > >> > > > > >> > > > >> > > > > > > > > >
