[ 
https://issues.apache.org/jira/browse/SPARK-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511724#comment-16511724
 ] 

John Zhuge commented on SPARK-5152:
-----------------------------------

SPARK-7169 alleviated this issue, however, still find this approach 
*spark.metrics.conf=s3://bucket/spark-metrics/graphite.properties* a little 
more convenient and clean. Compared to *spark.metrics.conf.** in SparkConf, a 
metrics config file groups the properties together, separate from the rest of 
the Spark properties. In my case, there are 10 properties. It is easy to swap 
out the config file by different users or for different purposes, especially in 
a self-serving environment. I wish spark-submit can accept multiple 
'--properties-file' options.

The downside is this will add one more dependency on hadoop-client in 
spark-core, besides history server.

Pretty simple change. Let me know whether I can post an PR.
{code:java}
- case Some(f) => new FileInputStream(f)
+ case Some(f) =>
+   val hadoopPath = new Path(Utils.resolveURI(f))
+   Utils.getHadoopFileSystem(hadoopPath.toUri, new 
Configuration()).open(hadoopPath)
{code}
 

> Let metrics.properties file take an hdfs:// path
> ------------------------------------------------
>
>                 Key: SPARK-5152
>                 URL: https://issues.apache.org/jira/browse/SPARK-5152
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.2.0
>            Reporter: Ryan Williams
>            Priority: Major
>
> From my reading of [the 
> code|https://github.com/apache/spark/blob/06dc4b5206a578065ebbb6bb8d54246ca007397f/core/src/main/scala/org/apache/spark/metrics/MetricsConfig.scala#L53],
>  the {{spark.metrics.conf}} property must be a path that is resolvable on the 
> local filesystem of each executor.
> Running a Spark job with {{--conf 
> spark.metrics.conf=hdfs://host1.domain.com/path/metrics.properties}} logs 
> many errors (~1 per executor, presumably?) like:
> {code}
> 15/01/08 13:20:57 ERROR metrics.MetricsConfig: Error loading configure file
> java.io.FileNotFoundException: hdfs:/host1.domain.com/path/metrics.properties 
> (No such file or directory)
>         at java.io.FileInputStream.open(Native Method)
>         at java.io.FileInputStream.<init>(FileInputStream.java:146)
>         at java.io.FileInputStream.<init>(FileInputStream.java:101)
>         at 
> org.apache.spark.metrics.MetricsConfig.initialize(MetricsConfig.scala:53)
>         at 
> org.apache.spark.metrics.MetricsSystem.<init>(MetricsSystem.scala:92)
>         at 
> org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:218)
>         at org.apache.spark.SparkEnv$.create(SparkEnv.scala:329)
>         at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:181)
>         at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:131)
>         at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
>         at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>         at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:60)
>         at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113)
>         at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163)
>         at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
> {code}
> which seems consistent with the idea that it's looking on the local 
> filesystem and not parsing the "scheme" portion of the URL.
> Letting all executors get their {{metrics.properties}} files from one 
> location on HDFS would be an improvement, right?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to