Miquel Canes created SPARK-28242:
------------------------------------

             Summary: DataStreamer keeps logging errors even after fixing 
writeStream output sink
                 Key: SPARK-28242
                 URL: https://issues.apache.org/jira/browse/SPARK-28242
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 2.4.3
         Environment: Hadoop 2.8.4

 
            Reporter: Miquel Canes


I have been testing what happens to a running structured streaming that is 
writing to HDFS when all datanodes are down/stopped or all cluster is down 
(including namenode)

So I created a structured stream from kafka to a File output sink to HDFS and 
tested some scenarios.

We used a very simple streamings:
{code:java}
spark.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", "kafka.server:9092...")
.option("subscribe", "test_topic")
.load()
.select(col("value").cast(DataTypes.StringType))
.writeStream()
.format("text")
.option("path", "HDFS/PATH")
.option("checkpointLocation", "checkpointPath")
.start()
.awaitTermination();{code}
 

After stopping all the datanodes the process starts logging the error that 
datanodes are bad.

That's correct...
{code:java}
2019-07-03 15:55:00 [spark-listener-group-eventLog] ERROR 
org.apache.spark.scheduler.AsyncEventQueue:91 - Listener EventLoggingListener 
threw an exception java.io.IOException: All datanodes 
[DatanodeInfoWithStorage[10.2.12.202:50010,DS-d2fba01b-28eb-4fe4-baaa-4072102a2172,DISK]]
 are bad. Aborting... at 
org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1530) 
at 
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1465)
 at 
org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1237)
 at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:657)
{code}
The problem is that even after starting again the datanodes the process keeps 
logging the same error all the time.

We checked and the WriteStream to HDFS recovered successfully after starting 
the datanodes and the output sink worked again without problems.

I have been trying some different HDFS configurations to be sure it's not a 
client config related problem but with no clue about how to fix it.

It seams that something is stuck indefinitely in an error loop.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to