[jira] [Commented] (DRILL-6147) Limit batch size for Flat Parquet Reader

ASF GitHub Bot (JIRA) Fri, 29 Jun 2018 10:40:08 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16528015#comment-16528015
 ]


ASF GitHub Bot commented on DRILL-6147:
---------------------------------------

ilooner commented on issue #1330: DRILL-6147: Adding Columnar Parquet Batch 
Sizing functionality
URL: https://github.com/apache/drill/pull/1330#issuecomment-401424230
 
 
   @vrozov I discussed the logging with @sachouche. The issue is that some 
information needed to be logged for QA purposes on a per query basis, and for 
ease of testing needed to be toggled from the sqlline. This was the agreed on 
approach by the batch sizing team, so at this point we're coming in at the end 
of a series of discussions. Nevertheless I spent about half a day to find a way 
this use case could be met in a logback centric way, and I could not find a 
reasonable alternative that could be done without a large amount of work.
   
   The main issue is that slf4j / logback has global logging configurations, 
which makes it impossible to turn on logging on a per query bases. We can use 
MDC to include a query id in the log messages and we could filter by query, but 
even that is not ideal since you are generating logs you don't need for all 
queries. Then there is the issue of updating the logging configuration. The 
only logback native method is to enable periodic scanning of the logback.xml 
file and to update the logback.xml file on all nodes when you want to enable 
certain logging. That is not a great experience for someone automating tests, 
or debugging.
   
   The conclusion we came to was that Drill needs a real logging solution, and 
to really do it right you would have to go down the rabbit hole.
   
     1. Enable per query logging configurations, not just a global logging 
configuration. To do this we would have to use thread local loggers instead of 
static loggers, or something like https://clogr.io/, or a custom 
ContextSelector 
https://stackoverflow.com/questions/38208617/practical-use-of-logback-context-selectors.
     2. Allow either sql line configuration of log levels or web ui 
configuration of log levels (like Apex had).
     3. Include a logging context with the query context to communicate the 
logging config for a query.
   
   These are non trivial, have a lot of corner cases, and require design 
discussions with the community.
   
   For the interim Salim is making the options controlling logging internal, so 
users will not see them. Then he is going to have a follow up PR to consolidate 
the logging across all the batch sizing operators. Later when we have a real 
logging solution we can delete these options and do the right thing.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Limit batch size for Flat Parquet Reader
> ----------------------------------------
>
>                 Key: DRILL-6147
>                 URL: https://issues.apache.org/jira/browse/DRILL-6147
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Parquet
>            Reporter: salim achouche
>            Assignee: salim achouche
>            Priority: Major
>             Fix For: 1.14.0
>
>
> The Parquet reader currently uses a hard-coded batch size limit (32k rows) 
> when creating scan batches; there is no parameter nor any logic for 
> controlling the amount of memory used. This enhancement will allow Drill to 
> take an extra input parameter to control direct memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6147) Limit batch size for Flat Parquet Reader

Reply via email to