[ 
https://issues.apache.org/jira/browse/HUDI-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17931122#comment-17931122
 ] 

Lokesh Jain commented on HUDI-9030:
-----------------------------------

Below tests were covered with table version 6.

{code:java}
spark-client module tests:
TestHoodieSparkCopyOnWriteTableArchiveWithReplace
TestHoodieSparkCopyOnWriteTableRollback
TestHoodieSparkMergeOnReadTableCompaction
TestHoodieSparkMergeOnReadTableIncrementalRead
TestHoodieSparkMergeOnReadTableInsertUpdateDelete
TestHoodieSparkMergeOnReadTableRollback
TestHoodieSparkRollback
TestHoodieMergeOnReadTable
TestCleanerInsertAndCleanByCommits

spark scala tests:
TestRecordLevelIndex
TestHoodieSparkSqlWriter
TestCOWDataSource
TestCOWDataSourceStorage
TestMORDataSource
TestMORDataSourceStorage
TestMORDataSourceWithBucketIndex
TestSparkDataSource
TestSparkSqlCoreFlow
TestStreamingSource
TestStructuredStreaming
TestTimeTravelQuery
{code}

> Validate and certify log files and marker interplays in MOR using table 
> version6 
> ---------------------------------------------------------------------------------
>
>                 Key: HUDI-9030
>                 URL: https://issues.apache.org/jira/browse/HUDI-9030
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: sivabalan narayanan
>            Assignee: Lokesh Jain
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.0.2
>
>   Original Estimate: 6h
>  Remaining Estimate: 6h
>
> Validate and certify log files and marker interplays in MOR using table 
> version6 in comparison to using 0.x writer.
> Below table lists the behaviour related to log files and marker generation in 
> 0.x and compares it with 1.x with table version 6 and 1.x with latest table 
> version.
> || ||0.x||Table Version 6||1.x||
> |Log File name Instant|Base file instant|Base file instant|Deltacommit 
> instant|
> |Log File name Write Token|Rollover log write token is always newly created
> Log write token is created from latest log file|Rollover log write token 
> concept is removed. Log write token is used instead.|Rollover log write token 
> concept is removed. Log write token is used instead.|
> |Log file version|Computed using latest log file|Computed using latest log 
> file|Computed using latest log file|
> |Append to existing log file|Allowed|Not Supported. Writes happen to a new 
> file.|Not Supported. Writes happen to a new file.|
> |Marker generation|Marker is created during append as well as when a new file 
> is created|Marker is created every time a new log file is created|Marker is 
> created every time a new log file is created|
> Other issues
> # For table version 6, AbstractHoodieLogRecordScanner ignores the log blocks 
> which belong to inflight instants while scanning. PR fixes the logic so that 
> such log blocks are not ignored. This is required for updating RLI which 
> reads the deleted records from data table
> # hoodie.file.group.reader.enabled needs to be disabled in tbl version 6
> # The new rollback logic filters log files using the deltacommit timestamp 
> and then marks them for deletion. This does not work for tbl version 6 since 
> log files do not have deltacommit timestamp in the name. Therefore older 
> rollback logic was brought back here.
> # PR removes the validation while scheduling compaction which validates that 
> compaction instant should be greater than all completed deltacommit instants. 
> This validation was added for table version 6 but is not really required.
> # KEY_GENERATOR_CLASS_NAME and KEY_GENERATOR_TYPE are new configs which are 
> required in tbl version 6 as well. PR makes a change so that these configs 
> are not ignored.
> # MarkerBasedRollbackStrategy#createRollbackRequestForCreateAndMerge removes 
> validation that log file should not have IOType as CREATE in table version 6. 
> CREATE is still used for log files with tbl version 6 in 
> LogFileCreationCallback#preFileCreation
> # hoodie.datasource.read.incr.fallback.fulltablescan.enable needs to be 
> disabled for tbl version 6



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to