Miquel Canes created SPARK-28242:
------------------------------------
Summary: DataStreamer keeps logging errors even after fixing
writeStream output sink
Key: SPARK-28242
URL: https://issues.apache.org/jira/browse/SPARK-28242
Project: Spark
Issue Type: Bug
Components: Structured Streaming
Affects Versions: 2.4.3
Environment: Hadoop 2.8.4
Reporter: Miquel Canes
I have been testing what happens to a running structured streaming that is
writing to HDFS when all datanodes are down/stopped or all cluster is down
(including namenode)
So I created a structured stream from kafka to a File output sink to HDFS and
tested some scenarios.
We used a very simple streamings:
{code:java}
spark.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", "kafka.server:9092...")
.option("subscribe", "test_topic")
.load()
.select(col("value").cast(DataTypes.StringType))
.writeStream()
.format("text")
.option("path", "HDFS/PATH")
.option("checkpointLocation", "checkpointPath")
.start()
.awaitTermination();{code}
After stopping all the datanodes the process starts logging the error that
datanodes are bad.
That's correct...
{code:java}
2019-07-03 15:55:00 [spark-listener-group-eventLog] ERROR
org.apache.spark.scheduler.AsyncEventQueue:91 - Listener EventLoggingListener
threw an exception java.io.IOException: All datanodes
[DatanodeInfoWithStorage[10.2.12.202:50010,DS-d2fba01b-28eb-4fe4-baaa-4072102a2172,DISK]]
are bad. Aborting... at
org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1530)
at
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1465)
at
org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1237)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:657)
{code}
The problem is that even after starting again the datanodes the process keeps
logging the same error all the time.
We checked and the WriteStream to HDFS recovered successfully after starting
the datanodes and the output sink worked again without problems.
I have been trying some different HDFS configurations to be sure it's not a
client config related problem but with no clue about how to fix it.
It seams that something is stuck indefinitely in an error loop.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]