[ 
https://issues.apache.org/jira/browse/HUDI-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17375329#comment-17375329
 ] 

ASF GitHub Bot commented on HUDI-2016:
--------------------------------------

vinothchandar commented on a change in pull request #3083:
URL: https://github.com/apache/hudi/pull/3083#discussion_r664311668



##########
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/metadata/TestHoodieBackedMetadata.java
##########
@@ -120,46 +120,63 @@ public void testDefaultNoMetadataTable() throws Exception 
{
     assertThrows(TableNotFoundException.class, () -> 
HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataTableBasePath).build());
 
     // Metadata table is not created if disabled by config
+    String firstCommitTime = HoodieActiveTimeline.createNewInstantTime();
     try (SparkRDDWriteClient client = new SparkRDDWriteClient(engineContext, 
getWriteConfig(true, false))) {
-      client.startCommitWithTime("001");
-      client.insert(jsc.emptyRDD(), "001");
+      client.startCommitWithTime(firstCommitTime);
+      client.insert(jsc.parallelize(dataGen.generateInserts(firstCommitTime, 
5)), firstCommitTime);

Review comment:
       As @prashantwason mentioned, its actually kind of important that we do 
that in preWrite() so we bring the metadata table in sync with the timeline, 
before writes happen. So not sure if it can be removed. 
   
   We are moving towards a synchronous design anyway, for updating metadata, so 
lets may be revisit in that context?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Metadata table bootstrap does not work when there are inflight instances
> ------------------------------------------------------------------------
>
>                 Key: HUDI-2016
>                 URL: https://issues.apache.org/jira/browse/HUDI-2016
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Prashant Wason
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>
> There is a race condition in metadata table bootstrap when there are inflight 
> instances.
> Example: Assume a CLEAN is in progress which is planning to delete 
> p1/f1.parquet (as per clean plan). If bootstrap is going on at the same time, 
> there are two cases possible:
>  # bootstrap lists files in partition p1 BEFORE clean deletes them
>  ## hence p1/f1.parquet is added to metadata table during bootstrap
>  ## When processing the CLEAN, p1/f1.parquet will be deleted from metadata 
> table
>  # bootstrap lists files in partition p1 AFTER clean deletes them
>  ## p1/f1.parquet is not found
>  ## When processing the CLEAN, p1/f1.parquet will be deleted from metadata 
> table
> We cannot differenciate 2.2 from the case that we missed adding p1/f1.parquet 
> to the metadata table.
> There is an exception in the metadata reader code to ensure that that any 
> file being deleted was added to the metadata table. This exception is throws 
> in case 2.2 above.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to