[
https://issues.apache.org/jira/browse/SPARK-18288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Elkhan Dadashov resolved SPARK-18288.
-------------------------------------
Resolution: Not A Problem
> SparkLauncer 2.0.1 version working incosistently in yarn-client mode
> --------------------------------------------------------------------
>
> Key: SPARK-18288
> URL: https://issues.apache.org/jira/browse/SPARK-18288
> Project: Spark
> Issue Type: Bug
> Components: Spark Submit
> Affects Versions: 2.0.1
> Environment: I'm running Spark 2.0.1 version with Spark Launcher
> 2.0.1 version on Yarn cluster. Deploy mode is yarn-client.
> Reporter: Elkhan Dadashov
>
> I'm running Spark 2.0.1 version with Spark Launcher 2.0.1 version on Yarn
> cluster. I launch map task which spawns Spark job via
> SparkLauncher#startApplication().
> Deploy mode is yarn-client.
> I'm running in Mac laptop.
> I have this snippet of code:
> {code:|borderStyle=solid}
> SparkAppHandle appHandle = sparkLauncher.startApplication();
> while (appHandle.getState() == null || !appHandle.getState().isFinal()) {
> if (appHandle.getState() != null) {
> // If the line below is commented, then appState and appId cannot be
> retrieved.
> log.info("while: Spark job state is : " + appHandle.getState());
> if (appHandle.getAppId() != null) {
> log.info("\t App id: " + appHandle.getAppId() + "\tState: " +
> appHandle.getState());
> }
> }
> }
> {code}
> The above snippet of code works fine, both spark job and the map task which
> spawns that Spark job successfully completes.
> But if i comment out the red highlighted line, then the Spark job launches
> and finishes successfully, but the map task hangs for a while (in Running
> state) and then fails with the exception below.
> I run exact same code in exact same environment except that one line
> commented out.
> When the highlighted line is commented out, I even do not see the 2nd log
> line in the stderr either, it seems appHandle hook never returns back
> anything (neither app id nor app state), even though spark application
> starts, runs and finishes successfully. Inside the same stderr, i can see
> Spark job related logs, and spark job results printed, and application report
> indicating status.
> You can see the exception below (this is from the stderr of the mapper
> container which launches Spark job):
> ---
> INFO: Communication exception: java.net.ConnectException: Call From
> <my-hostname>/10.3.8.118 to <my-hostname>:53567 failed on connection
> exception: java.net.ConnectException: Connection refused;
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
> at
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
> at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
> at
> org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
> at org.apache.hadoop.ipc.Client.call(Client.java:1451)
> ... 5 more
> ---
> Nov 05, 2016 2:41:54 AM org.apache.hadoop.ipc.Client handleConnectionFailure
> INFO: Retrying connect to server: <my-hostname>/10.3.8.118:53567. Already
> tried 9 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> MILLISECONDS)
> Nov 05, 2016 2:41:54 AM org.apache.hadoop.mapred.Task run
> INFO: Communication exception: java.net.ConnectException: Call From
> <my-hostname>/10.3.8.118 to <my-hostname>:53567 failed on connection
> exception: java.net.ConnectException: Connection refused; For more details
> see: http://wiki.apache.org/hadoop/ConnectionRefused
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
> at org.apache.hadoop.ipc.Client.call(Client.java:1479)
> at org.apache.hadoop.ipc.Client.call(Client.java:1412)
> at
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:242)
> at com.sun.proxy.$Proxy9.ping(Unknown Source)
> at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:767)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
> at
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
> at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
> at
> org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
> at org.apache.hadoop.ipc.Client.call(Client.java:1451)
> ... 5 more
> ---
> Nov 05, 2016 2:41:54 AM org.apache.hadoop.mapred.Task logThreadInfo
> INFO: Process Thread Dump: Communication exception
> 10 active threads
> Thread 24 (org.apache.hadoop.hdfs.PeerCache@4763c727):
> State: TIMED_WAITING
> Blocked count: 0
> Waited count: 79
> Stack:
> java.lang.Thread.sleep(Native Method)
> org.apache.hadoop.hdfs.PeerCache.run(PeerCache.java:255)
> org.apache.hadoop.hdfs.PeerCache.access$000(PeerCache.java:46)
> org.apache.hadoop.hdfs.PeerCache$1.run(PeerCache.java:124)
> java.lang.Thread.run(Thread.java:745)
> 0 New
> Reply to all
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]