Rahul Challapalli created DRILL-5472:
----------------------------------------
Summary: Parquet reader generating low-density batches causing
Sort operator to spill un-necessarily
Key: DRILL-5472
URL: https://issues.apache.org/jira/browse/DRILL-5472
Project: Apache Drill
Issue Type: Bug
Components: Execution - Relational Operators, Storage - Parquet
Reporter: Rahul Challapalli
Assignee: Paul Rogers
git.commit.id.abbrev=1e0a14c
The parquet file used in the below query is ~20MB. The uncompressed size id
~1.2 GB. Now the below query has a sort which is given ~6GB memory for a single
fragment and yet it spills.
{code}
select * from (select * from
dfs.`/drill/testdata/resource-manager/all_types_large` s order by
s.missing12.x) d where d.missing3 is false;
{code}
The profile indicates that the above query has spilled twice. Attached the
profile and the logs
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)