[ 
https://issues.apache.org/jira/browse/DRILL-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385306#comment-16385306
 ] 

Aman Sinha commented on DRILL-6147:
-----------------------------------

We should make every effort to not penalize the structured/flat data read in 
order to unify everything under one type of reader. [~paul-rogers] you referred 
my name above in a few places (btw, it would be good to 'tag' the name for 
notification)  e.g _"Meanwhile, Aman is building a solution that says that 
actual users need complex structures. (Big data is often stored denormalized.)" 
,_ which is in reference to https://issues.apache.org/jira/browse/DRILL-5999.   
I don't want that requirement to compromise the performance of the structured 
types (well, maybe a very small percent is ok).  The reality is Drill is 
compared against other engines using benchmarks that are mostly structured 
types and we must keep pushing to improve that.   The second point is about the 
potential of future vectorized filtering operations that I mentioned in the 
previous comment.  

It would be best to have discussion over a hangout session with the interested 
parties. 

 

> Limit batch size for Flat Parquet Reader
> ----------------------------------------
>
>                 Key: DRILL-6147
>                 URL: https://issues.apache.org/jira/browse/DRILL-6147
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Parquet
>            Reporter: salim achouche
>            Assignee: salim achouche
>            Priority: Major
>             Fix For: 1.14.0
>
>
> The Parquet reader currently uses a hard-coded batch size limit (32k rows) 
> when creating scan batches; there is no parameter nor any logic for 
> controlling the amount of memory used. This enhancement will allow Drill to 
> take an extra input parameter to control direct memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to