George Jahad created HDDS-5877:
----------------------------------
Summary: listStatus() race condition when
OZONE_OM_ENABLE_FILESYSTEM_PATHS disabled
Key: HDDS-5877
URL: https://issues.apache.org/jira/browse/HDDS-5877
Project: Apache Ozone
Issue Type: Bug
Components: Ozone Filesystem
Reporter: George Jahad
Under certain circumstances, listStatus() fails to return an intermediate
directory if the key containing it hasn't yet been transferred from the cache
to the keyTable.
When commitKey() is called, a key is first written to the keyTableCache and
then to the keyTable when the double buffer is flushed.
If OZONE_OM_ENABLE_FILESYSTEM_PATHS is enabled, and the key has a
superdirectory, the superdirectory's are added to cache during the openKey()
call, here:
https://github.com/apache/ozone/blob/af5b48e4571e74fcad2e4546ff0efe79c3a96350/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMKeyCreateRequest.java#L270
If OZONE_OM_ENABLE_FILESYSTEM_PATHS is not enabled, the superdirectories are
not added to the cache. In addition, the cache search method in listStatus() is
written to skip any keys with an embedded "/" here,
https://github.com/apache/ozone/blob/af5b48e4571e74fcad2e4546ff0efe79c3a96350/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java#L2285
So, if OZONE_OM_ENABLE_FILESYSTEM_PATHS is disabled, and a key with an embedded
"/" is commited, and it is still in the cache, listStatus() won't see it or
it's superdirectory.
Since the test passes once the key is in the table, it can be worked around by
sleeping/retrying, but probably listStatus() itself should be modified so a
read from the cache works the same as a read from the keyTable.
This test exhibits the problem:
TestOzoneFileSystem::testListStatusWithIntermediateDir:
https://github.com/apache/ozone/blob/295492ec87429c9baeb12246c870231d35641535/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/fs/ozone/TestOzoneFileSystem.java#L588
Note that the test incorrectly failed to show the problem until it was updated
to stop using the pre HA code. (That code doesn't use the cache, so the problem
was never seen.)
It is also possible that listStatus() just shouldn't be used when
OZONE_OM_ENABLE_FILESYSTEM_PATHS is disabled. In that case, maybe it should
throw an exception.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]