[
https://issues.apache.org/jira/browse/SPARK-25715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16649281#comment-16649281
]
Hyukjin Kwon commented on SPARK-25715:
--------------------------------------
Please avoid to set the target version which is usually reversed for
committers, and fix version which are usually set when it's actually fixed.
> support configuration on rpc port and port range on yarn client mode
> --------------------------------------------------------------------
>
> Key: SPARK-25715
> URL: https://issues.apache.org/jira/browse/SPARK-25715
> Project: Spark
> Issue Type: Improvement
> Components: YARN
> Affects Versions: 2.3.0
> Reporter: Tank Sui
> Priority: Major
>
> When connect yarn cluster directly using yarn client in kubernates pods. a
> random port used now in driver.
> the Error i come acrossed.
> n has already exited with state FINISHED!
> 2018-10-11 14:50:54 ERROR TransportClient:233 - Failed to send RPC
> 7696103738206710019 to /10.200.103.58:52294: java.io.IOException: Connection
> reset bypeer
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
> at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
> at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
> at sun.nio.ch.IOUtil.write(IOUtil.java:65)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
> at
> org.apache.spark.network.protocol.MessageWithHeader.copyByteBuf(MessageWithHeader.java:142)
> at
> org.apache.spark.network.protocol.MessageWithHeader.transferTo(MessageWithHeader.java:123)
> at
> io.netty.channel.socket.nio.NioSocketChannel.doWriteFileRegion(NioSocketChannel.java:355)
> at
> io.netty.channel.nio.AbstractNioByteChannel.doWrite(AbstractNioByteChannel.java:224)
> at
> io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:382)
> at
> io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:934)
> at
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:362)
> at
> io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:901)
> at
> io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1321)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
> at
> io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
> at
> io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:115)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
> at
> io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
> at io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:117)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
> at
> io.netty.channel.AbstractChannelHandlerContext.access$1500(AbstractChannelHandlerContext.java:38)
> at
> io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1129)
> at
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1070)
> at
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
> at
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> at java.lang.Thread.run(Thread.java:748)
> main_awsbackup_presto_emc_init_spu: INFO **************** execute exception
> ******************
> main_awsbackup_presto_emc_init_spu: INFO job completed, env: awsbackup,
> site:presto_emc, table: spu mode: init failed!
> 2018-10-11 14:50:54 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:91 -
> Sending RequestExecutors(0,0,Map(),Set()) to AM was unsuccessful
> java.io.IOException: Failed to send RPC 7696103738206710019 to
> /10.200.103.58:52294: java.io.IOException: Connection reset by peer
> at
> org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237)
> at
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
> at
> io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:500)
> at
> io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:479)
> at
> io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
> at
> io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122)
> at
> io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64)
> at
> io.netty.channel.ChannelOutboundBuffer.safeFail(ChannelOutboundBuffer.java:679)
> at
> io.netty.channel.ChannelOutboundBuffer.remove0(ChannelOutboundBuffer.java:293)
> at
> io.netty.channel.ChannelOutboundBuffer.failFlushed(ChannelOutboundBuffer.java:616)
> at
> io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:744)
> at
> io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:945)
> at
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:362)
> at
> io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:901)
> at
> io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1321)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
> at
> io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
> at
> io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:115)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
> at
> io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
> at io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:117)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
> at
> io.netty.channel.AbstractChannelHandlerContext.access$1500(AbstractChannelHandlerContext.java:38)
> at
> io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1129)
> at
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1070)
> at
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
> at
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
> at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
> at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
> at sun.nio.ch.IOUtil.write(IOUtil.java:65)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
> at
> org.apache.spark.network.protocol.MessageWithHeader.copyByteBuf(MessageWithHeader.java:142)
> at
> org.apache.spark.network.protocol.MessageWithHeader.transferTo(MessageWithHeader.java:123)
> at
> io.netty.channel.socket.nio.NioSocketChannel.doWriteFileRegion(NioSocketChannel.java:355)
> at
> io.netty.channel.nio.AbstractNioByteChannel.doWrite(AbstractNioByteChannel.java:224)
> at
> io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:382)
> at
> io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:934)
> ... 22 more
> 2018-10-11 14:50:54 ERROR Utils:91 - Uncaught exception in thread Yarn
> application state monitor
> org.apache.spark.SparkException: Exception thrown in awaitResult:
> at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
> at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
> at
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:567)
> at
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend.stop(YarnSchedulerBackend.scala:95)
> at
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:155)
> at
> org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:508)
> at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1755)
> at
> org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1931)
> at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1360)
> at org.apache.spark.SparkContext.stop(SparkContext.scala:1930)
> at
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:112)
> Caused by: java.io.IOException: Failed to send RPC 7696103738206710019 to
> /10.200.103.58:52294: java.io.IOException: Connection reset by peer
> at
> org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237)
> at
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
> at
> io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:500)
> at
> io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:479)
> at
> io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
> at
> io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122)
> at
> io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64)
> at
> io.netty.channel.ChannelOutboundBuffer.safeFail(ChannelOutboundBuffer.java:679)
> at
> io.netty.channel.ChannelOutboundBuffer.remove0(ChannelOutboundBuffer.java:293)
> at
> io.netty.channel.ChannelOutboundBuffer.failFlushed(ChannelOutboundBuffer.java:616)
> at
> io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:744)
> at
> io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:945)
> at
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:362)
> at
> io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:901)
> at
> io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1321)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
> at
> io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
> at
> io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:115)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
> at
> io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
> at io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:117)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
> at
> io.netty.channel.AbstractChannelHandlerContext.access$1500(AbstractChannelHandlerContext.java:38)
> at
> io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1129)
> at
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1070)
> at
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
> at
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
> at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
> at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
> at sun.nio.ch.IOUtil.write(IOUtil.java:65)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
> at
> org.apache.spark.network.protocol.MessageWithHeader.copyByteBuf(MessageWithHeader.java:142)
> at
> org.apache.spark.network.protocol.MessageWithHeader.transferTo(MessageWithHeader.java:123)
> at
> io.netty.channel.socket.nio.NioSocketChannel.doWriteFileRegion(NioSocketChannel.java:355)
> at
> io.netty.channel.nio.AbstractNioByteChannel.doWrite(AbstractNioByteChannel.java:224)
> at
> io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:382)
> at
> io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:934)
> ... 22 more
> main_awsbackup_presto_emc_init_spu: ERROR An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
> : org.apache.spark.SparkException: Job 32 cancelled because SparkContext was
> shut down
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:837)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:835)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
> at
> org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:835)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:1841)
> at org.apache.spark.util.EventLoop.stop(EventLoop.scala:83)
> at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1754)
> at
> org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1931)
> at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1360)
> at org.apache.spark.SparkContext.stop(SparkContext.scala:1930)
> at
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:112)
> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2074)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
> at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:939)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
> at org.apache.spark.rdd.RDD.collect(RDD.scala:938)
> at
> org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:162)
> at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:282)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:238)
> at java.lang.Thread.run(Thread.java:748)
> Traceback (most recent call last):
> File "main.py", line 156, in handle_current_table
> extented=params.extented, params=params, rebuild_logger = rebuild_logger)
> File "main.py", line 62, in rebuild_index
> render_sql, mg2es_params, max_mongo_time,last_start_time =
> sql_helper.execute_sql()
> File "/repo/helper.py", line 305, in execute_sql
> writer.flush(target_mongo_config, query)
> File "/repo/util/sql/init_presto_spu/z.final.py", line 123, in flush
> fdf.foreachPartition(write)
> File "/usr/local/lib/python3.6/dist-packages/pyspark/sql/dataframe.py", line
> 529, in foreachPartition
> self.rdd.foreachPartition(f)
> File "/usr/local/lib/python3.6/dist-packages/pyspark/rdd.py", line 824, in
> foreachPartition
> self.mapPartitions(func).count() # Force evaluation
> File "/usr/local/lib/python3.6/dist-packages/pyspark/rdd.py", line 1073, in
> count
> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
> File "/usr/local/lib/python3.6/dist-packages/pyspark/rdd.py", line 1064, in
> sum
> return self.mapPartitions(lambda x: [sum(x)]).fold(0, operator.add)
> File "/usr/local/lib/python3.6/dist-packages/pyspark/rdd.py", line 935, in
> fold
> vals = self.mapPartitions(func).collect()
> File "/usr/local/lib/python3.6/dist-packages/pyspark/rdd.py", line 834, in
> collect
> sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
> File "/usr/local/lib/python3.6/dist-packages/py4j/java_gateway.py", line
> 1257, in __call__
> answer, self.gateway_client, self.target_id, self.name)
> File "/usr/local/lib/python3.6/dist-packages/pyspark/sql/utils.py", line 63,
> in deco
> return f(*a, **kw)
> File "/usr/local/lib/python3.6/dist-packages/py4j/protocol.py", line 328, in
> get_return_value
> format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
> : org.apache.spark.SparkException: Job 32 cancelled because SparkContext was
> shut down
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:837)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:835)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
> at
> org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:835)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:1841)
> at org.apache.spark.util.EventLoop.stop(EventLoop.scala:83)
> at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1754)
> at
> org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1931)
> at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1360)
> at org.apache.spark.SparkContext.stop(SparkContext.scala:1930)
> at
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:112)
> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2074)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
> at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:939)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
> at org.apache.spark.rdd.RDD.collect(RDD.scala:938)
> at
> org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:162)
> at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:282)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:238)
> at java.lang.Thread.run(Thread.java:748)
> 2018-10-11 14:50:54,919 [140236227806976] ERROR main.py.main.<module>:246 -
> 2018-10-11 14:50:54|awsbackup|2|An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.collectAndServe. :
> org.apache.spark.SparkException: Job 32 cancelled because SparkContext was
> shut down at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:837)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:835)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78) at
> org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:835)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:1841)
> at org.apache.spark.util.EventLoop.stop(EventLoop.scala:83) at
> org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1754) at
> org.apache.spa| | |init|awsbackup presto_emc spu init
> batch|presto_emc|2018-10-11 12:49:33|fail|spu|121.3|7281| | | | | | |
> srch_data_es_log_awsbackup_presto_emc_spu: ERROR 2018-10-11
> 14:50:54|awsbackup|2|An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.collectAndServe. :
> org.apache.spark.SparkException: Job 32 cancelled because SparkContext was
> shut down at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:837)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:835)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78) at
> org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:835)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:1841)
> at org.apache.spark.util.EventLoop.stop(EventLoop.scala:83) at
> org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1754) at
> org.apache.spa| | |init|awsbackup presto_emc spu init
> batch|presto_emc|2018-10-11 12:49:33|fail|spu|121.3|7281| | | || | |
> Traceback (most recent call last):
> File "main.py", line 226, in <module>
> handle_current_table(current_table=params.table, params=params)
> File "main.py", line 165, in handle_current_table
> raise e
> File "main.py", line 156, in handle_current_table
> extented=params.extented, params=params, rebuild_logger = rebuild_logger)
> File "main.py", line 62, in rebuild_index
> render_sql, mg2es_params, max_mongo_time,last_start_time =
> sql_helper.execute_sql()
> File "/repo/helper.py", line 305, in execute_sql
> writer.flush(target_mongo_config, query)
> File "/repo/util/sql/init_presto_spu/z.final.py", line 123, in flush
> fdf.foreachPartition(write)
> File "/usr/local/lib/python3.6/dist-packages/pyspark/sql/dataframe.py", line
> 529, in foreachPartition
> self.rdd.foreachPartition(f)
> File "/usr/local/lib/python3.6/dist-packages/pyspark/rdd.py", line 824, in
> foreachPartition
> self.mapPartitions(func).count() # Force evaluation
> File "/usr/local/lib/python3.6/dist-packages/pyspark/rdd.py", line 1073, in
> count
> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
> File "/usr/local/lib/python3.6/dist-packages/pyspark/rdd.py", line 1064, in
> sum
> return self.mapPartitions(lambda x: [sum(x)]).fold(0, operator.add)
> File "/usr/local/lib/python3.6/dist-packages/pyspark/rdd.py", line 935, in
> fold
> vals = self.mapPartitions(func).collect()
> File "/usr/local/lib/python3.6/dist-packages/pyspark/rdd.py", line 834, in
> collect
> sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
> File "/usr/local/lib/python3.6/dist-packages/py4j/java_gateway.py", line
> 1257, in __call__
> answer, self.gateway_client, self.target_id, self.name)
> File "/usr/local/lib/python3.6/dist-packages/pyspark/sql/utils.py", line 63,
> in deco
> return f(*a, **kw)
> File "/usr/local/lib/python3.6/dist-packages/py4j/protocol.py", line 328, in
> get_return_value
> format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
> : org.apache.spark.SparkException: Job 32 cancelled because SparkContext was
> shut down
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:837)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:835)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
> at
> org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:835)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:1841)
> at org.apache.spark.util.EventLoop.stop(EventLoop.scala:83)
> at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1754)
> at
> org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1931)
> at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1360)
> at org.apache.spark.SparkContext.stop(SparkContext.scala:1930)
> at
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:112)
> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2074)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
> at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:939)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
> at org.apache.spark.rdd.RDD.collect(RDD.scala:938)
> at
> org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:162)
> at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:282)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:238)
> at java.lang.Thread.run(Thread.java:748)
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
> File "main.py", line 247, in <module>
> raise RuntimeError(str(e))
> RuntimeError: An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
> : org.apache.spark.SparkException: Job 32 cancelled because SparkContext was
> shut down
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:837)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:835)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
> at
> org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:835)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:1841)
> at org.apache.spark.util.EventLoop.stop(EventLoop.scala:83)
> at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1754)
> at
> org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1931)
> at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1360)
> at org.apache.spark.SparkContext.stop(SparkContext.scala:1930)
> at
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:112)
> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2074)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
> at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:939)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
> at org.apache.spark.rdd.RDD.collect(RDD.scala:938)
> at
> org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:162)
> at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:282)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:238)
> at java.lang.Thread.run(Thread.java:748)
> *Failed to send RPC 7696103738206710019 to /10.200.103.58:52294:
> java.io.IOException: Connection reset bypeer*
> *10.200.103.58 is pods hostip and* *52294 is a random port, it`s not
> configurable for kubernate s deployment*
> Attach my code
> {code:java}
> conf = SparkConf()
> conf.set('spark.app.name', self.params.job_id)
> conf.set('spark.driver.bindAddress', '0.0.0.0')
> conf.set('spark.driver.host', spark_config.get('driver_host'))
> conf.set('spark.driver.port', spark_config.get('driver_port'))
> conf.set('spark.driver.blockManager.port',
> spark_config.get('driver_blockManager_port'))
> conf.set('spark.executor.cores', '12')
> conf.set('spark.executor.memory', '50G')
> conf.set('spark.executor.instances', '10')
> conf.set('spark.jars.packages',
> 'org.mongodb.spark:mongo-spark-connector_2.11:2.3.0')
> conf.set('spark.hadoop.yarn.resourcemanager.address',
> '{spark_host}:8032'.format(spark_host=spark_config.get('master_host')))
> conf.set('spark.hadoop.yarn.resourcemanager.hostname',
> spark_config.get('master_host'))
> conf.set('spark.yarn.access.namenodes',
> 'hdfs://{spark_host}:8020'.format(spark_host=spark_config.get('master_host')))
> conf.set('spark.yarn.stagingDir',
> 'hdfs://{spark_host}:8020/user/hadoop/'.format(spark_host=spark_config.get('master_host')))
> conf.set('spark.ui.port','20041')
> spark_sc = SparkContext('yarn', conf=conf)
> spark = SparkSession(spark_sc)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]