[ https://issues.apache.org/jira/browse/DRILL-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385395#comment-16385395 ]
Paul Rogers commented on DRILL-6147: ------------------------------------ To follow up, we should look at all sides of the issue. One factor overlooked in my previous note is that code now is better than code later. DRILL-6147 is available today and will immediately give users a performance boost. The result set loader is large and will take some months to commit, and so can't offer a benefit until then. It is hard to argue that we wait. Let's get DRILL-6147 in now, then revisit the issue later (doing the proposed test) once the result set loader is available. And, as discussed, DRILL-6147 works only for the flat Parquet reader. We'll need the result set loader for the Parquet reader that reads nested types. > Limit batch size for Flat Parquet Reader > ---------------------------------------- > > Key: DRILL-6147 > URL: https://issues.apache.org/jira/browse/DRILL-6147 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet > Reporter: salim achouche > Assignee: salim achouche > Priority: Major > Fix For: 1.14.0 > > > The Parquet reader currently uses a hard-coded batch size limit (32k rows) > when creating scan batches; there is no parameter nor any logic for > controlling the amount of memory used. This enhancement will allow Drill to > take an extra input parameter to control direct memory usage. -- This message was sent by Atlassian JIRA (v7.6.3#76005)