[jira] [Commented] (DRILL-6147) Limit batch size for Flat Parquet Reader

ASF GitHub Bot (JIRA) Fri, 29 Jun 2018 12:36:07 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16528124#comment-16528124
 ]


ASF GitHub Bot commented on DRILL-6147:
---------------------------------------

ilooner commented on issue #1330: DRILL-6147: Adding Columnar Parquet Batch 
Sizing functionality
URL: https://github.com/apache/drill/pull/1330#issuecomment-401453225
 
 
   @vrozov My understanding was the following. QA has setup automatic tests of 
both the performance of batch sizing as well as correctness on a real cluster. 
Each batch sizing change has unit tests to validate batch size. But on a real 
cluster with real data, the only viable way to validate right now for QA is to 
check the batch sizes output by an operator is through logging. Since Drill 
takes testing on real clusters seriously and aims to do more than just unit 
tests, I think this is perfectly acceptable.
   
   Since logging has overhead, and QA wanted to automate both the performance 
and correctness tests, they required the ability to turn logging off via sql 
line. This was the approach agreed on by developers and testers in the Drill 
community including @sachouche, @bitblender, @ppadma, robert (don't know his 
github id), and @priteshm.
   
   Given the scope of agreement in the community, the fact that similar changes 
have already been merged, and also to minor impact on the drill code itself ~20 
lines; I suggest moving this discussion to a separate change. In my 
investigation I was not able to find a viable alternative to this approach, 
@vrozov perhaps you could present an alternative approach on the dev list and 
lead the proposal. It would be a great help moving forward.
   
   In the meantime the changes proposed here represent a valuable performance 
improvement for the Drill community, so let's not hold up this change over this.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Limit batch size for Flat Parquet Reader
> ----------------------------------------
>
>                 Key: DRILL-6147
>                 URL: https://issues.apache.org/jira/browse/DRILL-6147
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Parquet
>            Reporter: salim achouche
>            Assignee: salim achouche
>            Priority: Major
>              Labels: ready-to-commit
>             Fix For: 1.14.0
>
>
> The Parquet reader currently uses a hard-coded batch size limit (32k rows) 
> when creating scan batches; there is no parameter nor any logic for 
> controlling the amount of memory used. This enhancement will allow Drill to 
> take an extra input parameter to control direct memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6147) Limit batch size for Flat Parquet Reader

Reply via email to