[
https://issues.apache.org/jira/browse/DRILL-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385489#comment-16385489
]
Aman Sinha edited comment on DRILL-6147 at 3/5/18 2:18 AM:
-----------------------------------------------------------
{quote}The question for you and [~sachouche] is simply this. Given that we have
a working mechanism, does it make sense to invent another one? Do we want to
have duplicate maintenance costs? Have to make changes in two places? And so on?
{quote}
Certainly, if the 2 methods have similar performance in terms of reading the
flat data, we should not have a separate code path...the result set loader will
be more flexible so we should go with that. Agree about the testing parameters
that should be considered while evaluating both readers. The TPC-DS data in my
mind is already a well established testbed for the flat structures, it has NULL
values and multiple variable-width columns, so we should use that for the
experiments.
was (Author: amansinha100):
{quote}The question for you and [~sachouche] is simply this. Given that we have
a working mechanism, does it make sense to invent another one? Do we want to
have duplicate maintenance costs? Have to make changes in two places? And so on?
{quote}
Certainly, if the 2 methods have similar performance in terms of reading the
flat data, we should not have a separate code path...the result set loader will
be more flexible so we should go with that. Agree about the testing parameters
that should be considered while evaluating both readers.
> Limit batch size for Flat Parquet Reader
> ----------------------------------------
>
> Key: DRILL-6147
> URL: https://issues.apache.org/jira/browse/DRILL-6147
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - Parquet
> Reporter: salim achouche
> Assignee: salim achouche
> Priority: Major
> Fix For: 1.14.0
>
>
> The Parquet reader currently uses a hard-coded batch size limit (32k rows)
> when creating scan batches; there is no parameter nor any logic for
> controlling the amount of memory used. This enhancement will allow Drill to
> take an extra input parameter to control direct memory usage.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)