[ 
https://issues.apache.org/jira/browse/PARQUET-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15057800#comment-15057800
 ] 

Michal Turek commented on PARQUET-401:
--------------------------------------

Hi, our libraries use generic dependency {{slf4j-api}} and applications 
concrete {{logback}}. I started using {{parquet-avro}} library in our 
application and found a possibly related issue that is not so obvious. So am 
adding a comment for considering during the refactoring.

{noformat}
        <dependency>
            <groupId>org.apache.parquet</groupId>
            <artifactId>parquet-avro</artifactId>
            <version>1.8.1</version>
        </dependency>
{noformat}

The following lines are appearing in stderr on close of Parquet file 
({{ParquetWriter<GenericRecord>}}).

{noformat}
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
{noformat}

After a few hours of debugging and frustration I found the following... 
{{parquet-avro}} depends on 
{{org.apache.parquet:parquet-format:2.3.0-incubating}} which is shadowing slf4j 
library. This means that Java CLASSPATH contains everything from slf4j twice - 
for example standard one {{org.slf4j.LoggerFactory}} and shadow 
{{parquet.org.slf4j.LoggerFactory}}.

The standard implementation is able to find {{logback}} and properly configure 
logging for the application code using {{logback.xml}}, the second one fails 
and displays the error log above (including improper package name).

After finding this it was easy to put a breakpoint to the correct place and 
find the caller, which is 
{{parquet.org.apache.thrift.transport.TIOStreamTransport}}.

I found commit 
https://github.com/apache/parquet-format/commit/145f4df56bacd388b77ac9ff62d0b7c3c1b4eab9
 and PARQUET-369 that already tries to solve this issue, but I don't think that 
adding shaded dependency to slf4j-nop is correct. This only disables logging 
for this library, including warning/error log message from 
{{TIOStreamTransport.close()}}. Very important information is effectively lost 
in such case.

Please consider to implement logging properly and depend only on non-shadowed 
{{slf4j-api}} and let the users of your libraries to decide which log messages 
to see and which not. I can attach an example source code if you want, but I 
think the issue is quite obvious. Please let me know in case of any questions.

> Deprecate Log and move to SLF4J Logger
> --------------------------------------
>
>                 Key: PARQUET-401
>                 URL: https://issues.apache.org/jira/browse/PARQUET-401
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.8.1
>            Reporter: Ryan Blue
>
> The current Log class is intended to allow swapping out logger back-ends, but 
> SLF4J already does this. It also doesn't expose as nice of an API as SLF4J, 
> which can handle formatting to avoid the cost of building log messages that 
> won't be used. I think we should deprecate the org.apache.parquet.Log class 
> and move to using SLF4J directly, instead of wrapping SLF4J (PARQUET-305).
> This will require deprecating the current Log class and replacing the current 
> uses of it with SLF4J.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to