[ 
https://issues.apache.org/jira/browse/SPARK-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16118403#comment-16118403
 ] 

Steve Scaffidi commented on SPARK-4412:
---------------------------------------

This is also an issue in the version of parquet distributed in CDH 5.x. In this 
case, I am using {{parquet-1.5.0-cdh5.8.4}} (sources available here: 
http://archive.cloudera.com/cdh5/cdh/5)

However, I've found a work-around for mapreduce jobs submitted via Hive. I'm 
sure this can be adapted for use with Spark as well.

* Add the following properties to your job's configuration (in my case, I added 
them to {{hive-site.xml}} since adding them to {{mapred-site.xml}} didn't work:
  {code}
  <property>
    <name>mapreduce.map.java.opts</name>
    <value>-Djava.util.logging.config.file=parquet-logging.properties</value>
  </property>
  <property>
    <name>mapreduce.reduce.java.opts</name>
    <value>-Djava.util.logging.config.file=parquet-logging.properties</value>
  </property>
  <property>
    <name>mapreduce.child.java.opts</name>
    <value>-Djava.util.logging.config.file=parquet-logging.properties</value>
  </property>{code}

* Create a file named {{parquet-logging.properties}} with the following 
contents:
  {code}
# Note: I'm certain not every line here is necessary. I just added them to 
cover all possible
# class/facility names.you will want to tailor this as per your needs.
.level=WARNING
java.util.logging.ConsoleHandler.level=WARNING

parquet.handlers=java.util.logging.ConsoleHandler
parquet.hadoop.handlers=java.util.logging.ConsoleHandler
org.apache.parquet.handlers=java.util.logging.ConsoleHandler
org.apache.parquet.hadoop.handlers=java.util.logging.ConsoleHandler

parquet.level=WARNING
parquet.hadoop.level=WARNING
org.apache.parquet.level=WARNING
org.apache.parquet.hadoop.level=WARNING{code}

* Add the file to the job. In Hive, this is most easily done like so:
  {code}ADD FILE /path/to/parquet-logging.properties;{code}

With this done, when you run your Hive queries, parquet should only log WARNING 
(and higher) level messages to the stdout container logs.

> Parquet logger cannot be configured
> -----------------------------------
>
>                 Key: SPARK-4412
>                 URL: https://issues.apache.org/jira/browse/SPARK-4412
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.2.0, 1.3.1
>            Reporter: Jim Carroll
>
> The Spark ParquetRelation.scala code makes the assumption that the 
> parquet.Log class has already been loaded. If 
> ParquetRelation.enableLogForwarding executes prior to the parquet.Log class 
> being loaded then the code in enableLogForwarding has no affect.
> ParquetRelation.scala attempts to override the parquet logger but, at least 
> currently (and if your application simply reads a parquet file before it does 
> anything else with Parquet), the parquet.Log class hasn't been loaded yet. 
> Therefore the code in ParquetRelation.enableLogForwarding has no affect. If 
> you look at the code in parquet.Log there's a static initializer that needs 
> to be called prior to enableLogForwarding or whatever enableLogForwarding 
> does gets undone by this static initializer.
> The "fix" would be to force the static initializer to get called in 
> parquet.Log as part of enableForwardLogging. 
> PR will be forthcomming.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to