[ 
https://issues.apache.org/jira/browse/SPARK-28242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444900#comment-17444900
 ] 

Shefron Yudy commented on SPARK-28242:
--------------------------------------

I saw the same error in SparkThriftServer process's log when I restart all 
datanodes of HDFS with the env Spark-2.4.0 and Hadoop-3.0.0), The logs as 
follows:

2021-11-16 13:52:11,044 ERROR [spark-listener-group-eventLog] 
scheduler.AsyncEventQueue:Listener EventLoggingListener threw an exception 
java.io.IOException: All datanodes 
[DatanodeInfoWithStorage[10.121.23.101:1019,DS-90cb8066-8e5c-443f-804b-20c3ad01851b,DISK]]
 are bad. Aborting... 
    at 
org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1561)
    at 
org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1495)
    at 
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1481)
 
    at 
org.apache.hadoop.hdfs.DataStreamer.processDatanodeErrorOrExternalError(DataStreamer.java:1256)
 
    at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:667)

The eventLog will be available normally  if I restart the SparkThriftServer, I 
suggest that the EventLoggingListener's dfs writer should reconnect  after all 
datanodes stop and then start later。  

> DataStreamer keeps logging errors even after fixing writeStream output sink
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-28242
>                 URL: https://issues.apache.org/jira/browse/SPARK-28242
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.4.3
>         Environment: Hadoop 2.8.4
>  
>            Reporter: Miquel Canes
>            Priority: Minor
>              Labels: bulk-closed
>
> I have been testing what happens to a running structured streaming that is 
> writing to HDFS when all datanodes are down/stopped or all cluster is down 
> (including namenode)
> So I created a structured stream from kafka to a File output sink to HDFS and 
> tested some scenarios.
> We used a very simple streamings:
> {code:java}
> spark.readStream()
> .format("kafka")
> .option("kafka.bootstrap.servers", "kafka.server:9092...")
> .option("subscribe", "test_topic")
> .load()
> .select(col("value").cast(DataTypes.StringType))
> .writeStream()
> .format("text")
> .option("path", "HDFS/PATH")
> .option("checkpointLocation", "checkpointPath")
> .start()
> .awaitTermination();{code}
>  
> After stopping all the datanodes the process starts logging the error that 
> datanodes are bad.
> That's correct...
> {code:java}
> 2019-07-03 15:55:00 [spark-listener-group-eventLog] ERROR 
> org.apache.spark.scheduler.AsyncEventQueue:91 - Listener EventLoggingListener 
> threw an exception java.io.IOException: All datanodes 
> [DatanodeInfoWithStorage[10.2.12.202:50010,DS-d2fba01b-28eb-4fe4-baaa-4072102a2172,DISK]]
>  are bad. Aborting... at 
> org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1530) 
> at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1465)
>  at 
> org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1237)
>  at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:657)
> {code}
> The problem is that even after starting again the datanodes the process keeps 
> logging the same error all the time.
> We checked and the WriteStream to HDFS recovered successfully after starting 
> the datanodes and the output sink worked again without problems.
> I have been trying some different HDFS configurations to be sure it's not a 
> client config related problem but with no clue about how to fix it.
> It seams that something is stuck indefinitely in an error loop.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to