[jira] [Commented] (DRILL-5472) Parquet reader generating low-density batches causing Sort operator to spill un-necessarily

Paul Rogers (JIRA) Thu, 04 May 2017 10:12:24 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997100#comment-15997100
 ]


Paul Rogers commented on DRILL-5472:
------------------------------------

This is a known issue with Parquet, but one that is not currently a high 
priority.

The thought here is that this issue will be resolved as a side-effect of the 
fix for DRILL-5211. For that bug, we must limit vector sizes to 16 MB. At 
present, the Parquet reader tries, but fails, to limit vector sizes. That 
failure causes random vector sizes and low density. Fixing the Parquet vector 
limit to avoid fragmentation will also, perhaps, reduced the low-density issue 
without the issue itself having to be a high priority.

> Parquet reader generating low-density batches causing Sort operator to spill 
> un-necessarily
> -------------------------------------------------------------------------------------------
>
>                 Key: DRILL-5472
>                 URL: https://issues.apache.org/jira/browse/DRILL-5472
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators, Storage - Parquet
>            Reporter: Rahul Challapalli
>            Assignee: Paul Rogers
>         Attachments: drill5472.log, drill5472.parquet, drill5472.sys.drill
>
>
> git.commit.id.abbrev=1e0a14c
> The parquet file used in the below query is ~20MB. The uncompressed size id 
> ~1.2 GB. Now the below query has a sort which is given ~6GB memory for a 
> single fragment and yet it spills.
> {code}
> select * from (select * from 
> dfs.`/drill/testdata/resource-manager/all_types_large` s order by 
> s.missing12.x) d where d.missing3 is false;
> {code}
> The profile indicates that the above query has spilled twice. Attached the 
> profile and the logs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5472) Parquet reader generating low-density batches causing Sort operator to spill un-necessarily

Reply via email to