[
https://issues.apache.org/jira/browse/SPARK-28242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444900#comment-17444900
]
Shefron Yudy commented on SPARK-28242:
--------------------------------------
I saw the same error in SparkThriftServer process's log when I restart all
datanodes of HDFS with the env Spark-2.4.0 and Hadoop-3.0.0), The logs as
follows:
2021-11-16 13:52:11,044 ERROR [spark-listener-group-eventLog]
scheduler.AsyncEventQueue:Listener EventLoggingListener threw an exception
java.io.IOException: All datanodes
[DatanodeInfoWithStorage[10.121.23.101:1019,DS-90cb8066-8e5c-443f-804b-20c3ad01851b,DISK]]
are bad. Aborting...
at
org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1561)
at
org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1495)
at
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1481)
at
org.apache.hadoop.hdfs.DataStreamer.processDatanodeErrorOrExternalError(DataStreamer.java:1256)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:667)
The eventLog will be available normally if I restart the SparkThriftServer, I
suggest that the EventLoggingListener's dfs writer should reconnect after all
datanodes stop and then start later。
> DataStreamer keeps logging errors even after fixing writeStream output sink
> ---------------------------------------------------------------------------
>
> Key: SPARK-28242
> URL: https://issues.apache.org/jira/browse/SPARK-28242
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 2.4.3
> Environment: Hadoop 2.8.4
>
> Reporter: Miquel Canes
> Priority: Minor
> Labels: bulk-closed
>
> I have been testing what happens to a running structured streaming that is
> writing to HDFS when all datanodes are down/stopped or all cluster is down
> (including namenode)
> So I created a structured stream from kafka to a File output sink to HDFS and
> tested some scenarios.
> We used a very simple streamings:
> {code:java}
> spark.readStream()
> .format("kafka")
> .option("kafka.bootstrap.servers", "kafka.server:9092...")
> .option("subscribe", "test_topic")
> .load()
> .select(col("value").cast(DataTypes.StringType))
> .writeStream()
> .format("text")
> .option("path", "HDFS/PATH")
> .option("checkpointLocation", "checkpointPath")
> .start()
> .awaitTermination();{code}
>
> After stopping all the datanodes the process starts logging the error that
> datanodes are bad.
> That's correct...
> {code:java}
> 2019-07-03 15:55:00 [spark-listener-group-eventLog] ERROR
> org.apache.spark.scheduler.AsyncEventQueue:91 - Listener EventLoggingListener
> threw an exception java.io.IOException: All datanodes
> [DatanodeInfoWithStorage[10.2.12.202:50010,DS-d2fba01b-28eb-4fe4-baaa-4072102a2172,DISK]]
> are bad. Aborting... at
> org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1530)
> at
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1465)
> at
> org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1237)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:657)
> {code}
> The problem is that even after starting again the datanodes the process keeps
> logging the same error all the time.
> We checked and the WriteStream to HDFS recovered successfully after starting
> the datanodes and the output sink worked again without problems.
> I have been trying some different HDFS configurations to be sure it's not a
> client config related problem but with no clue about how to fix it.
> It seams that something is stuck indefinitely in an error loop.
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]