[
https://issues.apache.org/jira/browse/SPARK-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Williams updated SPARK-5152:
---------------------------------
Description:
>From my reading of [the
>code|https://github.com/apache/spark/blob/06dc4b5206a578065ebbb6bb8d54246ca007397f/core/src/main/scala/org/apache/spark/metrics/MetricsConfig.scala#L53],
> the {{spark.metrics.conf}} property must be a path that is resolvable on the
>local filesystem of each executor.
Running a Spark job with {{--conf
spark.metrics.conf=hdfs://host1.domain.com/path/metrics.properties}} logs many
errors (~1 per executor, presumably?) like:
{code}
15/01/08 13:20:57 ERROR metrics.MetricsConfig: Error loading configure file
java.io.FileNotFoundException: hdfs:/host1.domain.com/path/metrics.properties
(No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at java.io.FileInputStream.<init>(FileInputStream.java:101)
at
org.apache.spark.metrics.MetricsConfig.initialize(MetricsConfig.scala:53)
at org.apache.spark.metrics.MetricsSystem.<init>(MetricsSystem.scala:92)
at
org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:218)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:329)
at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:181)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:131)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:60)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
{code}
which seems consistent with the idea that it's looking on the local filesystem
and not parsing the "scheme" portion of the URL.
Letting all executors get their {{metrics.properties}} files from one location
on HDFS would be an improvement, right?
was:
>From my reading of [the
>code|https://github.com/apache/spark/blob/06dc4b5206a578065ebbb6bb8d54246ca007397f/core/src/main/scala/org/apache/spark/metrics/MetricsConfig.scala#L53],
> the {{spark.metrics.conf}} property must be a path that is resolvable on the
>local filesystem of each executor.
Running a Spark job with {{--conf
spark.metrics.conf=hdfs://.../metrics.properties}} logs many errors (~1 per
executor, presumably?) like:
{code}
15/01/08 13:20:57 ERROR metrics.MetricsConfig: Error loading configure file
java.io.FileNotFoundException:
hdfs:/demeter-nn1.demeter.hpc.mssm.edu/user/willir31/metrics.properties (No
such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at java.io.FileInputStream.<init>(FileInputStream.java:101)
at
org.apache.spark.metrics.MetricsConfig.initialize(MetricsConfig.scala:53)
at org.apache.spark.metrics.MetricsSystem.<init>(MetricsSystem.scala:92)
at
org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:218)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:329)
at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:181)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:131)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:60)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
{code}
which seems consistent with the idea that it's looking on the local filesystem
and not parsing the "scheme" portion of the URL.
Letting all executors get their {{metrics.properties}} files from one location
on HDFS would be an improvement, right?
> Let metrics.properties file take an hdfs:// path
> ------------------------------------------------
>
> Key: SPARK-5152
> URL: https://issues.apache.org/jira/browse/SPARK-5152
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 1.2.0
> Reporter: Ryan Williams
>
> From my reading of [the
> code|https://github.com/apache/spark/blob/06dc4b5206a578065ebbb6bb8d54246ca007397f/core/src/main/scala/org/apache/spark/metrics/MetricsConfig.scala#L53],
> the {{spark.metrics.conf}} property must be a path that is resolvable on the
> local filesystem of each executor.
> Running a Spark job with {{--conf
> spark.metrics.conf=hdfs://host1.domain.com/path/metrics.properties}} logs
> many errors (~1 per executor, presumably?) like:
> {code}
> 15/01/08 13:20:57 ERROR metrics.MetricsConfig: Error loading configure file
> java.io.FileNotFoundException: hdfs:/host1.domain.com/path/metrics.properties
> (No such file or directory)
> at java.io.FileInputStream.open(Native Method)
> at java.io.FileInputStream.<init>(FileInputStream.java:146)
> at java.io.FileInputStream.<init>(FileInputStream.java:101)
> at
> org.apache.spark.metrics.MetricsConfig.initialize(MetricsConfig.scala:53)
> at
> org.apache.spark.metrics.MetricsSystem.<init>(MetricsSystem.scala:92)
> at
> org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:218)
> at org.apache.spark.SparkEnv$.create(SparkEnv.scala:329)
> at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:181)
> at
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:131)
> at
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
> at
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:60)
> at
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113)
> at
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163)
> at
> org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
> {code}
> which seems consistent with the idea that it's looking on the local
> filesystem and not parsing the "scheme" portion of the URL.
> Letting all executors get their {{metrics.properties}} files from one
> location on HDFS would be an improvement, right?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]