Hello,

For larger projects, the logging is sometimes broken up into namespaces
that have their own (default) logging level. That way lower level sub-systems 
are quiet in normal operation but can be “activated” when debugging issues etc.

This might make sense for Arrow as well? What does the Logger provide?

Cheers,
Maarten.



> On Jan 24, 2020, at 1:29 PM, David Mollitor <[email protected]> wrote:
> 
> Hello Ryan,
> 
> I appreciate you taking the time to share your thoughts.
> 
> I'd just like to point out that there is also TRACE level logging if
> Parquet requires greater granularity.
> 
> Furthermore, I'm not suggesting that there be an unbreakable rule that all
> logging must be DEBUG, but it should be the exception, not the rule.  It is
> more likely the situation the the wrapping application would be responsible
> for logging at the INFO and WARN/ERROR level.  Something like....
> 
> try {
>   LOG.info("Using Parquet to read file {}", path);
>   avroParquetReader.read();
> } catch (Exception e) {
>  LOG.error("Failed to read Parquet file", e);
> }
> 
> This is a very normal setup and doesn't require any additional logging from
> the Parquet library itself.  Once I see an error with "Failed to re Parquet
> file", then I'm going to turn on DEBUG logging and try to reproduce the
> error.
> 
> Thanks,
> David
> 
> On Fri, Jan 24, 2020 at 12:01 PM Ryan Blue <[email protected]>
> wrote:
> 
>> I don't agree with the idea to convert all of Parquet's logs to DEBUG
>> level, but I do think that we can improve the levels of individual
>> messages.
>> 
>> If we convert all logs to debug, then turning on logs to see what Parquet
>> is doing would show everything from opening an input file to position
>> tracking in output files. That's way too much information, which is why we
>> use different log levels to begin with.
>> 
>> I think we should continue using log levels to distinguish between types of
>> information: error for errors, warn for recoverable errors that may or may
>> not indicate a problem, info for regular operations, and debug for extra
>> information if you're debugging the Parquet library. Following the common
>> convention enables people to choose what information they want instead of
>> mixing it all together.
>> 
>> If you want to only see error and warning logs from Parquet, then the right
>> way to do that is to configure your logger so that the level for
>> org.apache.parquet classes is warn. That's not to say I don't agree that we
>> can cut down on what is logged at info and clean it up; I just don't think
>> it's a good idea to abandon the idea of log levels to distinguish between
>> different information the user of a library will need.
>> 
>> On Fri, Jan 24, 2020 at 6:30 AM lukas nalezenec <[email protected]> wrote:
>> 
>>> Hi,
>>> I can help too.
>>> Lukas
>>> 
>>> Dne pá 24. 1. 2020 15:29 uživatel David Mollitor <[email protected]>
>>> napsal:
>>> 
>>>> Hello Team,
>>>> 
>>>> I am happy to do the work of reviewing all Parquet logging, but I need
>>> help
>>>> getting the work committed.
>>>> 
>>>> Fokko Driesprong has been a wonderfully ally in helping me get
>>> incremental
>>>> improvements into Parquet, but I wonder if there's anyone else that can
>>>> share in the load.
>>>> 
>>>> Thanks,
>>>> David
>>>> 
>>>> On Thu, Jan 23, 2020 at 11:55 AM Michael Heuer <[email protected]>
>>> wrote:
>>>> 
>>>>> Hello David,
>>>>> 
>>>>> As I mentioned on PARQUET-1758, we have been frustrated by overly
>>> verbose
>>>>> logging in Parquet for a long time.  Various workarounds have been
>> more
>>>> or
>>>>> less successful, e.g.
>>>>> 
>>>>> https://github.com/bigdatagenomics/adam/issues/851 <
>>>>> https://github.com/bigdatagenomics/adam/issues/851>
>>>>> 
>>>>> I would support a move making Parquet a silent partner.  :)
>>>>> 
>>>>>   michael
>>>>> 
>>>>> 
>>>>>> On Jan 23, 2020, at 10:25 AM, David Mollitor <[email protected]>
>>>> wrote:
>>>>>> 
>>>>>> Hello Team,
>>>>>> 
>>>>>> I have been a consumer of Apache Parquet through Apache Hive for
>>>> several
>>>>>> years now.  For a long time, logging in Parquet has been pretty
>>>> painful.
>>>>>> Some of the logging was going to STDOUT and some was going to
>> Log4J.
>>>>>> Overall, though the framework has been too verbose, spewing many
>> log
>>>>> lines
>>>>>> about internal details of Parquet I don't understand.
>>>>>> 
>>>>>> The logging has gotten a lot better with recent releases moving
>>> solidly
>>>>>> into SLF4J.  That is awesome and very welcomed.  However, (opinion
>>>>> alert) I
>>>>>> think the logging is still too verbose.  I think Parquet should be
>> a
>>>>> silent
>>>>>> partner in data processing.  If everything is going well, it should
>>> be
>>>>>> silent (DEBUG level logging).  If things are going wrong, it should
>>>> throw
>>>>>> an Exception.
>>>>>> 
>>>>>> If an operator suspects Parquet is the issue (and that's rarely the
>>>> first
>>>>>> thing to check), they can set the logging for all of the Loggers in
>>> the
>>>>>> entire Parquet package (org.apache.parquet) to DEBUG to get the
>>>> required
>>>>>> information.  Not to mention, the less logging it does, the faster
>> it
>>>>> will
>>>>>> be.
>>>>>> 
>>>>>> I've opened this discussion because I've got two PRs related to
>> this
>>>>> topic
>>>>>> ready to go:
>>>>>> 
>>>>>> PARQUET-1758
>>>>>> PARQUET-1761
>>>>>> 
>>>>>> Thanks,
>>>>>> David
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>> 
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>> 

Reply via email to