Venkata krishnan Sowrirajan created SPARK-36810:
---------------------------------------------------
Summary: Handle HDSF read inconsistencies on Spark when observer
Namenode is used
Key: SPARK-36810
URL: https://issues.apache.org/jira/browse/SPARK-36810
Project: Spark
Issue Type: Bug
Components: Spark Core, SQL
Affects Versions: 3.2.0
Reporter: Venkata krishnan Sowrirajan
In short, with HDFS HA and with the use of [Observer
Namenode|[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html],]
the read-after-write consistency is only available when both the write and the
read happens from the same client.
But if the write happens on executor and the read happens on the driver, then
the reads would be inconsistent causing application failure issues. This can be
fixed by calling `FileSystem.msync` before making any read calls where the
client thinks the write could have possibly happened elsewhere.
This issue is discussed in greater detail in this
[discussion|https://mail-archives.apache.org/mod_mbox/spark-dev/202108.mbox/browser]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]