Drillers, I executed the below query on TPCH SF100 with drill and it took ~2hrs to complete on a 2 node cluster.
alter session set `planner.width.max_per_node` = 4; alter session set `planner.memory.max_query_memory_per_node` = 8147483648; create table lineitem partition by (l_shipdate, l_receiptdate) as select * from dfs.`/drill/testdata/tpch100/lineitem`; The below query returned 75780, so I expected drill to create the same no of files or may be a little more. But drill created so many files that a "hadoop fs -count" command failed with a "GC overhead limit exceeded". (I did not change the default parquet block size) select count(*) from (select l_shipdate, l_receiptdate from dfs.`/drill/testdata/tpch100/lineitem` group by l_shipdate, l_receiptdate) sub; +---------+ | EXPR$0 | +---------+ | 75780 | +---------+ Any thoughts on why drill is creating so many files? - Rahul
