Prashant Wason created HUDI-2016:
------------------------------------

             Summary: Metadata table bootstrap does not work when there are 
inflight instances
                 Key: HUDI-2016
                 URL: https://issues.apache.org/jira/browse/HUDI-2016
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Prashant Wason


There is a race condition in metadata table bootstrap when there are inflight 
instances.

Example: Assume a CLEAN is in progress which is planning to delete 
p1/f1.parquet (as per clean plan). If bootstrap is going on at the same time, 
there are two cases possible:
 # bootstrap lists files in partition p1 BEFORE clean deletes them
 ## hence p1/f1.parquet is added to metadata table during bootstrap
 ## When processing the CLEAN, p1/f1.parquet will be deleted from metadata table
 # bootstrap lists files in partition p1 AFTER clean deletes them
 ## p1/f1.parquet is not found
 ## When processing the CLEAN, p1/f1.parquet will be deleted from metadata table

We cannot differenciate 2.2 from the case that we missed adding p1/f1.parquet 
to the metadata table.

There is an exception in the metadata reader code to ensure that that any file 
being deleted was added to the metadata table. This exception is throws in case 
2.2 above.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to