[
https://issues.apache.org/jira/browse/DRILL-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16528015#comment-16528015
]
ASF GitHub Bot commented on DRILL-6147:
---------------------------------------
ilooner commented on issue #1330: DRILL-6147: Adding Columnar Parquet Batch
Sizing functionality
URL: https://github.com/apache/drill/pull/1330#issuecomment-401424230
@vrozov I discussed the logging with @sachouche. The issue is that some
information needed to be logged for QA purposes on a per query basis, and for
ease of testing needed to be toggled from the sqlline. This was the agreed on
approach by the batch sizing team, so at this point we're coming in at the end
of a series of discussions. Nevertheless I spent about half a day to find a way
this use case could be met in a logback centric way, and I could not find a
reasonable alternative that could be done without a large amount of work.
The main issue is that slf4j / logback has global logging configurations,
which makes it impossible to turn on logging on a per query bases. We can use
MDC to include a query id in the log messages and we could filter by query, but
even that is not ideal since you are generating logs you don't need for all
queries. Then there is the issue of updating the logging configuration. The
only logback native method is to enable periodic scanning of the logback.xml
file and to update the logback.xml file on all nodes when you want to enable
certain logging. That is not a great experience for someone automating tests,
or debugging.
The conclusion we came to was that Drill needs a real logging solution, and
to really do it right you would have to go down the rabbit hole.
1. Enable per query logging configurations, not just a global logging
configuration. To do this we would have to use thread local loggers instead of
static loggers, or something like https://clogr.io/, or a custom
ContextSelector
https://stackoverflow.com/questions/38208617/practical-use-of-logback-context-selectors.
2. Allow either sql line configuration of log levels or web ui
configuration of log levels (like Apex had).
3. Include a logging context with the query context to communicate the
logging config for a query.
These are non trivial, have a lot of corner cases, and require design
discussions with the community.
For the interim Salim is making the options controlling logging internal, so
users will not see them. Then he is going to have a follow up PR to consolidate
the logging across all the batch sizing operators. Later when we have a real
logging solution we can delete these options and do the right thing.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Limit batch size for Flat Parquet Reader
> ----------------------------------------
>
> Key: DRILL-6147
> URL: https://issues.apache.org/jira/browse/DRILL-6147
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - Parquet
> Reporter: salim achouche
> Assignee: salim achouche
> Priority: Major
> Fix For: 1.14.0
>
>
> The Parquet reader currently uses a hard-coded batch size limit (32k rows)
> when creating scan batches; there is no parameter nor any logic for
> controlling the amount of memory used. This enhancement will allow Drill to
> take an extra input parameter to control direct memory usage.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)