George Jahad created HDDS-5877:
----------------------------------

             Summary: listStatus() race condition when 
OZONE_OM_ENABLE_FILESYSTEM_PATHS disabled
                 Key: HDDS-5877
                 URL: https://issues.apache.org/jira/browse/HDDS-5877
             Project: Apache Ozone
          Issue Type: Bug
          Components: Ozone Filesystem
            Reporter: George Jahad


Under certain circumstances, listStatus() fails to return an intermediate 
directory if the key containing it hasn't yet been transferred from the cache 
to the keyTable.

When commitKey() is called, a key is first written to the keyTableCache and 
then to the keyTable when the double buffer is flushed.

If OZONE_OM_ENABLE_FILESYSTEM_PATHS is enabled, and the key has a 
superdirectory, the superdirectory's are added to cache during the openKey() 
call, here: 
https://github.com/apache/ozone/blob/af5b48e4571e74fcad2e4546ff0efe79c3a96350/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMKeyCreateRequest.java#L270

If OZONE_OM_ENABLE_FILESYSTEM_PATHS is not enabled, the superdirectories are 
not added to the cache. In addition, the cache search method in listStatus() is 
written to skip any keys with an embedded "/" here, 
https://github.com/apache/ozone/blob/af5b48e4571e74fcad2e4546ff0efe79c3a96350/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java#L2285

So, if OZONE_OM_ENABLE_FILESYSTEM_PATHS is disabled, and a key with an embedded 
"/" is commited, and it is still in the cache, listStatus() won't see it or 
it's superdirectory.

Since the test passes once the key is in the table, it can be worked around by 
sleeping/retrying, but probably listStatus() itself should be modified so a 
read from the cache works the same as a read from the keyTable.

This test exhibits the problem: 
TestOzoneFileSystem::testListStatusWithIntermediateDir: 
https://github.com/apache/ozone/blob/295492ec87429c9baeb12246c870231d35641535/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/fs/ozone/TestOzoneFileSystem.java#L588

Note that the test incorrectly failed to show the problem until it was updated 
to stop using the pre HA code. (That code doesn't use the cache, so the problem 
was never seen.)

It is also possible that listStatus() just shouldn't be used when 
OZONE_OM_ENABLE_FILESYSTEM_PATHS is disabled. In that case, maybe it should 
throw an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to