[
https://issues.apache.org/jira/browse/HUDI-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17931122#comment-17931122
]
Lokesh Jain commented on HUDI-9030:
-----------------------------------
Below tests were covered with table version 6.
{code:java}
spark-client module tests:
TestHoodieSparkCopyOnWriteTableArchiveWithReplace
TestHoodieSparkCopyOnWriteTableRollback
TestHoodieSparkMergeOnReadTableCompaction
TestHoodieSparkMergeOnReadTableIncrementalRead
TestHoodieSparkMergeOnReadTableInsertUpdateDelete
TestHoodieSparkMergeOnReadTableRollback
TestHoodieSparkRollback
TestHoodieMergeOnReadTable
TestCleanerInsertAndCleanByCommits
spark scala tests:
TestRecordLevelIndex
TestHoodieSparkSqlWriter
TestCOWDataSource
TestCOWDataSourceStorage
TestMORDataSource
TestMORDataSourceStorage
TestMORDataSourceWithBucketIndex
TestSparkDataSource
TestSparkSqlCoreFlow
TestStreamingSource
TestStructuredStreaming
TestTimeTravelQuery
{code}
> Validate and certify log files and marker interplays in MOR using table
> version6
> ---------------------------------------------------------------------------------
>
> Key: HUDI-9030
> URL: https://issues.apache.org/jira/browse/HUDI-9030
> Project: Apache Hudi
> Issue Type: Sub-task
> Reporter: sivabalan narayanan
> Assignee: Lokesh Jain
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.0.2
>
> Original Estimate: 6h
> Remaining Estimate: 6h
>
> Validate and certify log files and marker interplays in MOR using table
> version6 in comparison to using 0.x writer.
> Below table lists the behaviour related to log files and marker generation in
> 0.x and compares it with 1.x with table version 6 and 1.x with latest table
> version.
> || ||0.x||Table Version 6||1.x||
> |Log File name Instant|Base file instant|Base file instant|Deltacommit
> instant|
> |Log File name Write Token|Rollover log write token is always newly created
> Log write token is created from latest log file|Rollover log write token
> concept is removed. Log write token is used instead.|Rollover log write token
> concept is removed. Log write token is used instead.|
> |Log file version|Computed using latest log file|Computed using latest log
> file|Computed using latest log file|
> |Append to existing log file|Allowed|Not Supported. Writes happen to a new
> file.|Not Supported. Writes happen to a new file.|
> |Marker generation|Marker is created during append as well as when a new file
> is created|Marker is created every time a new log file is created|Marker is
> created every time a new log file is created|
> Other issues
> # For table version 6, AbstractHoodieLogRecordScanner ignores the log blocks
> which belong to inflight instants while scanning. PR fixes the logic so that
> such log blocks are not ignored. This is required for updating RLI which
> reads the deleted records from data table
> # hoodie.file.group.reader.enabled needs to be disabled in tbl version 6
> # The new rollback logic filters log files using the deltacommit timestamp
> and then marks them for deletion. This does not work for tbl version 6 since
> log files do not have deltacommit timestamp in the name. Therefore older
> rollback logic was brought back here.
> # PR removes the validation while scheduling compaction which validates that
> compaction instant should be greater than all completed deltacommit instants.
> This validation was added for table version 6 but is not really required.
> # KEY_GENERATOR_CLASS_NAME and KEY_GENERATOR_TYPE are new configs which are
> required in tbl version 6 as well. PR makes a change so that these configs
> are not ignored.
> # MarkerBasedRollbackStrategy#createRollbackRequestForCreateAndMerge removes
> validation that log file should not have IOType as CREATE in table version 6.
> CREATE is still used for log files with tbl version 6 in
> LogFileCreationCallback#preFileCreation
> # hoodie.datasource.read.incr.fallback.fulltablescan.enable needs to be
> disabled for tbl version 6
--
This message was sent by Atlassian Jira
(v8.20.10#820010)