[ 
https://issues.apache.org/jira/browse/DRILL-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385489#comment-16385489
 ] 

Aman Sinha edited comment on DRILL-6147 at 3/5/18 2:18 AM:
-----------------------------------------------------------

{quote}The question for you and [~sachouche] is simply this. Given that we have 
a working mechanism, does it make sense to invent another one? Do we want to 
have duplicate maintenance costs? Have to make changes in two places? And so on?
{quote}
Certainly, if the 2 methods have similar performance in terms of reading the 
flat data, we should not have a separate code path...the result set loader will 
be more flexible so we should go with that.  Agree about the testing parameters 
that should be considered while evaluating both readers.  The TPC-DS data in my 
mind is already a well established testbed for the flat structures, it has NULL 
values and multiple variable-width columns, so we should use that for the 
experiments. 


was (Author: amansinha100):
{quote}The question for you and [~sachouche] is simply this. Given that we have 
a working mechanism, does it make sense to invent another one? Do we want to 
have duplicate maintenance costs? Have to make changes in two places? And so on?
{quote}
Certainly, if the 2 methods have similar performance in terms of reading the 
flat data, we should not have a separate code path...the result set loader will 
be more flexible so we should go with that.  Agree about the testing parameters 
that should be considered while evaluating both readers.  

> Limit batch size for Flat Parquet Reader
> ----------------------------------------
>
>                 Key: DRILL-6147
>                 URL: https://issues.apache.org/jira/browse/DRILL-6147
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Parquet
>            Reporter: salim achouche
>            Assignee: salim achouche
>            Priority: Major
>             Fix For: 1.14.0
>
>
> The Parquet reader currently uses a hard-coded batch size limit (32k rows) 
> when creating scan batches; there is no parameter nor any logic for 
> controlling the amount of memory used. This enhancement will allow Drill to 
> take an extra input parameter to control direct memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to