[
https://issues.apache.org/jira/browse/HUDI-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397647#comment-17397647
]
ASF GitHub Bot commented on HUDI-2119:
--------------------------------------
vinothchandar commented on a change in pull request #3210:
URL: https://github.com/apache/hudi/pull/3210#discussion_r687045652
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -137,6 +138,9 @@ protected HoodieBackedTableMetadataWriter(Configuration
hadoopConf, HoodieWriteC
private HoodieWriteConfig createMetadataWriteConfig(HoodieWriteConfig
writeConfig) {
int parallelism = writeConfig.getMetadataInsertParallelism();
+ int minCommitsToKeep = Math.max(writeConfig.getMetadataMinCommitsToKeep(),
writeConfig.getMinCommitsToKeep());
Review comment:
nice.
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -504,6 +511,20 @@ public void close() throws Exception {
}
}
+ /**
+ * Return the timestamp of the latest synced instant.
+ */
+ @Override
+ public Option<String> getLatestSyncedInstantTime() {
Review comment:
could n't we call `metadata.getSyncedInstantTime()` here?
##########
File path:
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadata.java
##########
@@ -106,5 +109,13 @@ static HoodieTableMetadata create(HoodieEngineContext
engineContext, HoodieMetad
*/
Option<String> getSyncedInstantTime();
+ /**
+ * Get the instant time to which the metadata is synced w.r.t data timeline
on the reader side.
+ *
+ * This is different from the getSyncedInstantTime() because the reader
should sync all completed instants
+ * from the data timeline in order to always provide the most up to date
view of the files within the dataset.
+ */
+ Option<String> getSyncedInstantTimeForReader();
Review comment:
So we need this only for tests?
```
./hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadata.java:
Option<String> getSyncedInstantTimeForReader();
./hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java:
public Option<String> getSyncedInstantTimeForReader() {
./hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
public Option<String> getSyncedInstantTimeForReader() {
./hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java:
assertEquals(metadata.getSyncedInstantTimeForReader().get(),
newCommitTime);
```
##########
File path:
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
##########
@@ -480,6 +497,144 @@ public void testRollbackUnsyncedCommit(HoodieTableType
tableType) throws Excepti
client.syncTableMetadata();
validateMetadata(client);
}
+
+ // If an unsynced commit is automatically rolled back during next commit,
the rollback commit gets a timestamp
+ // greater than than the new commit which is started. Ensure that in this
case the rollback is not processed
+ // as the earlier failed commit would not have been committed.
+ //
+ // Dataset: C1 C2 C3.inflight[failed] C4 R5[rolls
back C3]
+ // Metadata: C1.delta C2.delta
Review comment:
this fix also helps avoid the inconsistent exception during the follow
situation. cc @umehrot2
```
// Dataset: C1 C2 C3.inflight[failed] R4[rolls back
C3] C4
// Metadata: C1.delta C2.delta
```
##########
File path:
hudi-common/src/main/java/org/apache/hudi/metadata/TimelineMergedTableMetadata.java
##########
@@ -112,4 +116,15 @@ private void processNextRecord(HoodieRecord<? extends
HoodieRecordPayload> hoodi
public Option<HoodieRecord<HoodieMetadataPayload>> getRecordByKey(String
key) {
return Option.ofNullable((HoodieRecord) timelineMergedRecords.get(key));
}
+
+ /**
+ * Returns the timestamp of the latest synced instant.
+ */
+ public Option<String> getSyncedInstantTime() {
Review comment:
Could we call this something else? this basically returns the earliest
unsynced instant time, right? Mixing `sync` terminology here is kind of
throwing me off, trying to grok this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> Syncing of rollbacks to metadata table does not work in all cases
> -----------------------------------------------------------------
>
> Key: HUDI-2119
> URL: https://issues.apache.org/jira/browse/HUDI-2119
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Prashant Wason
> Assignee: Prashant Wason
> Priority: Blocker
> Labels: pull-request-available, release-blocker
> Fix For: 0.9.0
>
>
> This is an issue with inline automatic rollbacks.
> Metadata table assumes that a rollbacks is to be applied if the
> instant-being-rolled back has a timestamp less than the last deltacommit time
> on the metadata timeline. We do not explicitly check if the
> instant-being-rolled-back was actually written to metadata table.
> **A rollback adds a record to metadata table which "deletes" files from a
> failed/earlier commit. If the files being deleted were never actually
> committed to metadata table earlier, the deletes cannot be consolidated
> during metadata table reads. This leads to a HoodieMetadataException as we
> cannot differentiate this from a bug where we might have missed committing a
> commit to metadata table.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)