Hello, For larger projects, the logging is sometimes broken up into namespaces that have their own (default) logging level. That way lower level sub-systems are quiet in normal operation but can be “activated” when debugging issues etc.
This might make sense for Arrow as well? What does the Logger provide? Cheers, Maarten. > On Jan 24, 2020, at 1:29 PM, David Mollitor <[email protected]> wrote: > > Hello Ryan, > > I appreciate you taking the time to share your thoughts. > > I'd just like to point out that there is also TRACE level logging if > Parquet requires greater granularity. > > Furthermore, I'm not suggesting that there be an unbreakable rule that all > logging must be DEBUG, but it should be the exception, not the rule. It is > more likely the situation the the wrapping application would be responsible > for logging at the INFO and WARN/ERROR level. Something like.... > > try { > LOG.info("Using Parquet to read file {}", path); > avroParquetReader.read(); > } catch (Exception e) { > LOG.error("Failed to read Parquet file", e); > } > > This is a very normal setup and doesn't require any additional logging from > the Parquet library itself. Once I see an error with "Failed to re Parquet > file", then I'm going to turn on DEBUG logging and try to reproduce the > error. > > Thanks, > David > > On Fri, Jan 24, 2020 at 12:01 PM Ryan Blue <[email protected]> > wrote: > >> I don't agree with the idea to convert all of Parquet's logs to DEBUG >> level, but I do think that we can improve the levels of individual >> messages. >> >> If we convert all logs to debug, then turning on logs to see what Parquet >> is doing would show everything from opening an input file to position >> tracking in output files. That's way too much information, which is why we >> use different log levels to begin with. >> >> I think we should continue using log levels to distinguish between types of >> information: error for errors, warn for recoverable errors that may or may >> not indicate a problem, info for regular operations, and debug for extra >> information if you're debugging the Parquet library. Following the common >> convention enables people to choose what information they want instead of >> mixing it all together. >> >> If you want to only see error and warning logs from Parquet, then the right >> way to do that is to configure your logger so that the level for >> org.apache.parquet classes is warn. That's not to say I don't agree that we >> can cut down on what is logged at info and clean it up; I just don't think >> it's a good idea to abandon the idea of log levels to distinguish between >> different information the user of a library will need. >> >> On Fri, Jan 24, 2020 at 6:30 AM lukas nalezenec <[email protected]> wrote: >> >>> Hi, >>> I can help too. >>> Lukas >>> >>> Dne pá 24. 1. 2020 15:29 uživatel David Mollitor <[email protected]> >>> napsal: >>> >>>> Hello Team, >>>> >>>> I am happy to do the work of reviewing all Parquet logging, but I need >>> help >>>> getting the work committed. >>>> >>>> Fokko Driesprong has been a wonderfully ally in helping me get >>> incremental >>>> improvements into Parquet, but I wonder if there's anyone else that can >>>> share in the load. >>>> >>>> Thanks, >>>> David >>>> >>>> On Thu, Jan 23, 2020 at 11:55 AM Michael Heuer <[email protected]> >>> wrote: >>>> >>>>> Hello David, >>>>> >>>>> As I mentioned on PARQUET-1758, we have been frustrated by overly >>> verbose >>>>> logging in Parquet for a long time. Various workarounds have been >> more >>>> or >>>>> less successful, e.g. >>>>> >>>>> https://github.com/bigdatagenomics/adam/issues/851 < >>>>> https://github.com/bigdatagenomics/adam/issues/851> >>>>> >>>>> I would support a move making Parquet a silent partner. :) >>>>> >>>>> michael >>>>> >>>>> >>>>>> On Jan 23, 2020, at 10:25 AM, David Mollitor <[email protected]> >>>> wrote: >>>>>> >>>>>> Hello Team, >>>>>> >>>>>> I have been a consumer of Apache Parquet through Apache Hive for >>>> several >>>>>> years now. For a long time, logging in Parquet has been pretty >>>> painful. >>>>>> Some of the logging was going to STDOUT and some was going to >> Log4J. >>>>>> Overall, though the framework has been too verbose, spewing many >> log >>>>> lines >>>>>> about internal details of Parquet I don't understand. >>>>>> >>>>>> The logging has gotten a lot better with recent releases moving >>> solidly >>>>>> into SLF4J. That is awesome and very welcomed. However, (opinion >>>>> alert) I >>>>>> think the logging is still too verbose. I think Parquet should be >> a >>>>> silent >>>>>> partner in data processing. If everything is going well, it should >>> be >>>>>> silent (DEBUG level logging). If things are going wrong, it should >>>> throw >>>>>> an Exception. >>>>>> >>>>>> If an operator suspects Parquet is the issue (and that's rarely the >>>> first >>>>>> thing to check), they can set the logging for all of the Loggers in >>> the >>>>>> entire Parquet package (org.apache.parquet) to DEBUG to get the >>>> required >>>>>> information. Not to mention, the less logging it does, the faster >> it >>>>> will >>>>>> be. >>>>>> >>>>>> I've opened this discussion because I've got two PRs related to >> this >>>>> topic >>>>>> ready to go: >>>>>> >>>>>> PARQUET-1758 >>>>>> PARQUET-1761 >>>>>> >>>>>> Thanks, >>>>>> David >>>>> >>>>> >>>> >>> >> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >>
