I don't agree with the idea to convert all of Parquet's logs to DEBUG
level, but I do think that we can improve the levels of individual messages.

If we convert all logs to debug, then turning on logs to see what Parquet
is doing would show everything from opening an input file to position
tracking in output files. That's way too much information, which is why we
use different log levels to begin with.

I think we should continue using log levels to distinguish between types of
information: error for errors, warn for recoverable errors that may or may
not indicate a problem, info for regular operations, and debug for extra
information if you're debugging the Parquet library. Following the common
convention enables people to choose what information they want instead of
mixing it all together.

If you want to only see error and warning logs from Parquet, then the right
way to do that is to configure your logger so that the level for
org.apache.parquet classes is warn. That's not to say I don't agree that we
can cut down on what is logged at info and clean it up; I just don't think
it's a good idea to abandon the idea of log levels to distinguish between
different information the user of a library will need.

On Fri, Jan 24, 2020 at 6:30 AM lukas nalezenec <[email protected]> wrote:

> Hi,
> I can help too.
> Lukas
>
> Dne pá 24. 1. 2020 15:29 uživatel David Mollitor <[email protected]>
> napsal:
>
> > Hello Team,
> >
> > I am happy to do the work of reviewing all Parquet logging, but I need
> help
> > getting the work committed.
> >
> > Fokko Driesprong has been a wonderfully ally in helping me get
> incremental
> > improvements into Parquet, but I wonder if there's anyone else that can
> > share in the load.
> >
> > Thanks,
> > David
> >
> > On Thu, Jan 23, 2020 at 11:55 AM Michael Heuer <[email protected]>
> wrote:
> >
> > > Hello David,
> > >
> > > As I mentioned on PARQUET-1758, we have been frustrated by overly
> verbose
> > > logging in Parquet for a long time.  Various workarounds have been more
> > or
> > > less successful, e.g.
> > >
> > > https://github.com/bigdatagenomics/adam/issues/851 <
> > > https://github.com/bigdatagenomics/adam/issues/851>
> > >
> > > I would support a move making Parquet a silent partner.  :)
> > >
> > >    michael
> > >
> > >
> > > > On Jan 23, 2020, at 10:25 AM, David Mollitor <[email protected]>
> > wrote:
> > > >
> > > > Hello Team,
> > > >
> > > > I have been a consumer of Apache Parquet through Apache Hive for
> > several
> > > > years now.  For a long time, logging in Parquet has been pretty
> > painful.
> > > > Some of the logging was going to STDOUT and some was going to Log4J.
> > > > Overall, though the framework has been too verbose, spewing many log
> > > lines
> > > > about internal details of Parquet I don't understand.
> > > >
> > > > The logging has gotten a lot better with recent releases moving
> solidly
> > > > into SLF4J.  That is awesome and very welcomed.  However, (opinion
> > > alert) I
> > > > think the logging is still too verbose.  I think Parquet should be a
> > > silent
> > > > partner in data processing.  If everything is going well, it should
> be
> > > > silent (DEBUG level logging).  If things are going wrong, it should
> > throw
> > > > an Exception.
> > > >
> > > > If an operator suspects Parquet is the issue (and that's rarely the
> > first
> > > > thing to check), they can set the logging for all of the Loggers in
> the
> > > > entire Parquet package (org.apache.parquet) to DEBUG to get the
> > required
> > > > information.  Not to mention, the less logging it does, the faster it
> > > will
> > > > be.
> > > >
> > > > I've opened this discussion because I've got two PRs related to this
> > > topic
> > > > ready to go:
> > > >
> > > > PARQUET-1758
> > > > PARQUET-1761
> > > >
> > > > Thanks,
> > > > David
> > >
> > >
> >
>


-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to