Prashant Wason created HUDI-2016:
------------------------------------
Summary: Metadata table bootstrap does not work when there are
inflight instances
Key: HUDI-2016
URL: https://issues.apache.org/jira/browse/HUDI-2016
Project: Apache Hudi
Issue Type: Bug
Reporter: Prashant Wason
There is a race condition in metadata table bootstrap when there are inflight
instances.
Example: Assume a CLEAN is in progress which is planning to delete
p1/f1.parquet (as per clean plan). If bootstrap is going on at the same time,
there are two cases possible:
# bootstrap lists files in partition p1 BEFORE clean deletes them
## hence p1/f1.parquet is added to metadata table during bootstrap
## When processing the CLEAN, p1/f1.parquet will be deleted from metadata table
# bootstrap lists files in partition p1 AFTER clean deletes them
## p1/f1.parquet is not found
## When processing the CLEAN, p1/f1.parquet will be deleted from metadata table
We cannot differenciate 2.2 from the case that we missed adding p1/f1.parquet
to the metadata table.
There is an exception in the metadata reader code to ensure that that any file
being deleted was added to the metadata table. This exception is throws in case
2.2 above.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)