Hello Ryan,
I appreciate you taking the time to share your thoughts.
I'd just like to point out that there is also TRACE level logging if
Parquet requires greater granularity.
Furthermore, I'm not suggesting that there be an unbreakable rule that all
logging must be DEBUG, but it should be the exception, not the rule. It is
more likely the situation the the wrapping application would be responsible
for logging at the INFO and WARN/ERROR level. Something like....
try {
LOG.info("Using Parquet to read file {}", path);
avroParquetReader.read();
} catch (Exception e) {
LOG.error("Failed to read Parquet file", e);
}
This is a very normal setup and doesn't require any additional logging from
the Parquet library itself. Once I see an error with "Failed to re Parquet
file", then I'm going to turn on DEBUG logging and try to reproduce the
error.
Thanks,
David
On Fri, Jan 24, 2020 at 12:01 PM Ryan Blue <[email protected]>
wrote:
> I don't agree with the idea to convert all of Parquet's logs to DEBUG
> level, but I do think that we can improve the levels of individual
> messages.
>
> If we convert all logs to debug, then turning on logs to see what Parquet
> is doing would show everything from opening an input file to position
> tracking in output files. That's way too much information, which is why we
> use different log levels to begin with.
>
> I think we should continue using log levels to distinguish between types of
> information: error for errors, warn for recoverable errors that may or may
> not indicate a problem, info for regular operations, and debug for extra
> information if you're debugging the Parquet library. Following the common
> convention enables people to choose what information they want instead of
> mixing it all together.
>
> If you want to only see error and warning logs from Parquet, then the right
> way to do that is to configure your logger so that the level for
> org.apache.parquet classes is warn. That's not to say I don't agree that we
> can cut down on what is logged at info and clean it up; I just don't think
> it's a good idea to abandon the idea of log levels to distinguish between
> different information the user of a library will need.
>
> On Fri, Jan 24, 2020 at 6:30 AM lukas nalezenec <[email protected]> wrote:
>
> > Hi,
> > I can help too.
> > Lukas
> >
> > Dne pá 24. 1. 2020 15:29 uživatel David Mollitor <[email protected]>
> > napsal:
> >
> > > Hello Team,
> > >
> > > I am happy to do the work of reviewing all Parquet logging, but I need
> > help
> > > getting the work committed.
> > >
> > > Fokko Driesprong has been a wonderfully ally in helping me get
> > incremental
> > > improvements into Parquet, but I wonder if there's anyone else that can
> > > share in the load.
> > >
> > > Thanks,
> > > David
> > >
> > > On Thu, Jan 23, 2020 at 11:55 AM Michael Heuer <[email protected]>
> > wrote:
> > >
> > > > Hello David,
> > > >
> > > > As I mentioned on PARQUET-1758, we have been frustrated by overly
> > verbose
> > > > logging in Parquet for a long time. Various workarounds have been
> more
> > > or
> > > > less successful, e.g.
> > > >
> > > > https://github.com/bigdatagenomics/adam/issues/851 <
> > > > https://github.com/bigdatagenomics/adam/issues/851>
> > > >
> > > > I would support a move making Parquet a silent partner. :)
> > > >
> > > > michael
> > > >
> > > >
> > > > > On Jan 23, 2020, at 10:25 AM, David Mollitor <[email protected]>
> > > wrote:
> > > > >
> > > > > Hello Team,
> > > > >
> > > > > I have been a consumer of Apache Parquet through Apache Hive for
> > > several
> > > > > years now. For a long time, logging in Parquet has been pretty
> > > painful.
> > > > > Some of the logging was going to STDOUT and some was going to
> Log4J.
> > > > > Overall, though the framework has been too verbose, spewing many
> log
> > > > lines
> > > > > about internal details of Parquet I don't understand.
> > > > >
> > > > > The logging has gotten a lot better with recent releases moving
> > solidly
> > > > > into SLF4J. That is awesome and very welcomed. However, (opinion
> > > > alert) I
> > > > > think the logging is still too verbose. I think Parquet should be
> a
> > > > silent
> > > > > partner in data processing. If everything is going well, it should
> > be
> > > > > silent (DEBUG level logging). If things are going wrong, it should
> > > throw
> > > > > an Exception.
> > > > >
> > > > > If an operator suspects Parquet is the issue (and that's rarely the
> > > first
> > > > > thing to check), they can set the logging for all of the Loggers in
> > the
> > > > > entire Parquet package (org.apache.parquet) to DEBUG to get the
> > > required
> > > > > information. Not to mention, the less logging it does, the faster
> it
> > > > will
> > > > > be.
> > > > >
> > > > > I've opened this discussion because I've got two PRs related to
> this
> > > > topic
> > > > > ready to go:
> > > > >
> > > > > PARQUET-1758
> > > > > PARQUET-1761
> > > > >
> > > > > Thanks,
> > > > > David
> > > >
> > > >
> > >
> >
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>