Venkata krishnan Sowrirajan created SPARK-36810:
---------------------------------------------------

             Summary: Handle HDSF read inconsistencies on Spark when observer 
Namenode is used
                 Key: SPARK-36810
                 URL: https://issues.apache.org/jira/browse/SPARK-36810
             Project: Spark
          Issue Type: Bug
          Components: Spark Core, SQL
    Affects Versions: 3.2.0
            Reporter: Venkata krishnan Sowrirajan


In short, with HDFS HA and with the use of [Observer 
Namenode|[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html],]
 the read-after-write consistency is only available when both the write and the 
read happens from the same client.

But if the write happens on executor and the read happens on the driver, then 
the reads would be inconsistent causing application failure issues. This can be 
fixed by calling `FileSystem.msync` before making any read calls where the 
client thinks the write could have possibly happened elsewhere.

This issue is discussed in greater detail in this 
[discussion|https://mail-archives.apache.org/mod_mbox/spark-dev/202108.mbox/browser]
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to