[
https://issues.apache.org/jira/browse/SPARK-11227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yuri Saito updated SPARK-11227:
-------------------------------
Description:
When running jar including Spark Job at HDFS HA Cluster, Mesos and Spark1.5.1,
the job throw Exception as "java.net.UnknownHostException: nameservice1" and
fail.
I do below in Terminal.
{code}
/opt/spark/bin/spark-submit \
--class com.example.Job /jobs/job-assembly-1.0.0.jar
{code}
So, job throw below message.
{code}
15/10/21 15:22:12 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0
(TID 0, spark003.example.com): java.lang.IllegalArgumentException:
java.net.UnknownHostException: nameservice1
at
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374)
at
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:312)
at
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:178)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:665)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:601)
at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169)
at
org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656)
at
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:436)
at
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:409)
at
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1016)
at
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1016)
at
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: nameservice1
... 41 more
{code}
But, I changed from Spark Cluster 1.5.1 to Spark Cluster 1.4.0, then run the
job, job complete with Success.
In Addition, I disable High Availability on HDFS, then run the job, job
complete with Success.
So, I think Spark1.5 and higher have bug as the point.
note: I try these packages in my Cluster, But both of these fail.
* spark-1.5.1-bin-hadoop2.6.tgz
* spark-1.5.1-bin-without-hadoop.tgz
Only *spark-1.4.0-bin-hadoop2.6.tgz* success.
was:
When running jar including Spark Job at HDFS HA Cluster, Mesos and Spark1.5.1,
the job throw Exception as "java.net.UnknownHostException: nameservice1" and
fail.
I do below in Terminal.
{code}
/opt/spark/bin/spark-submit \
--class com.example.Job /jobs/job-assembly-1.0.0.jar
{code}
So, job throw below message.
{code}
15/10/21 15:22:12 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0
(TID 0, spark003.example.com): java.lang.IllegalArgumentException:
java.net.UnknownHostException: nameservice1
at
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374)
at
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:312)
at
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:178)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:665)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:601)
at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169)
at
org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656)
at
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:436)
at
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:409)
at
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1016)
at
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1016)
at
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: nameservice1
... 41 more
{code}
But, I changed from Spark Cluster 1.5.1 to Spark Cluster 1.4.0, then run the
job, job complete with Success.
So, I think Spark1.5 and higher have bug as the point.
note: I try these packages in my Cluster, But both of these fail.
* spark-1.5.1-bin-hadoop2.6.tgz
* spark-1.5.1-bin-without-hadoop.tgz
Only *spark-1.4.0-bin-hadoop2.6.tgz* success.
> Spark1.5+ HDFS HA mode throw java.net.UnknownHostException: nameservice1
> ------------------------------------------------------------------------
>
> Key: SPARK-11227
> URL: https://issues.apache.org/jira/browse/SPARK-11227
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.5.0, 1.5.1
> Environment: OS: CentOS 6.6
> Memory: 28G
> CPU: 8
> Mesos: 0.22.0
> HDFS: Hadoop 2.6.0-CDH5.4.0 (build by Cloudera Manager)
> Reporter: Yuri Saito
>
> When running jar including Spark Job at HDFS HA Cluster, Mesos and
> Spark1.5.1, the job throw Exception as "java.net.UnknownHostException:
> nameservice1" and fail.
> I do below in Terminal.
> {code}
> /opt/spark/bin/spark-submit \
> --class com.example.Job /jobs/job-assembly-1.0.0.jar
> {code}
> So, job throw below message.
> {code}
> 15/10/21 15:22:12 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0
> (TID 0, spark003.example.com): java.lang.IllegalArgumentException:
> java.net.UnknownHostException: nameservice1
> at
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374)
> at
> org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:312)
> at
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:178)
> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:665)
> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:601)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148)
> at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
> at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169)
> at
> org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656)
> at
> org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:436)
> at
> org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:409)
> at
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1016)
> at
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1016)
> at
> org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
> at
> org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
> at scala.Option.map(Option.scala:145)
> at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.UnknownHostException: nameservice1
> ... 41 more
> {code}
> But, I changed from Spark Cluster 1.5.1 to Spark Cluster 1.4.0, then run the
> job, job complete with Success.
> In Addition, I disable High Availability on HDFS, then run the job, job
> complete with Success.
> So, I think Spark1.5 and higher have bug as the point.
> note: I try these packages in my Cluster, But both of these fail.
> * spark-1.5.1-bin-hadoop2.6.tgz
> * spark-1.5.1-bin-without-hadoop.tgz
> Only *spark-1.4.0-bin-hadoop2.6.tgz* success.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]