[ 
https://issues.apache.org/jira/browse/HUDI-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866210#comment-17866210
 ] 

Ethan Guo commented on HUDI-7971:
---------------------------------

We can leverage and extend bundle validation script and framework 
([hudi/packaging/bundle-validation at master · apache/hudi · 
GitHub|https://github.com/apache/hudi/tree/master/packaging/bundle-validation]) 
to achieve the compatibility tests.  These compatibility tests should be added 
as GH CI workflow so every PR must run them.

> Test and Certify 0.14.x to 0.16.x tables are readable in 1.x Hudi reader 
> -------------------------------------------------------------------------
>
>                 Key: HUDI-7971
>                 URL: https://issues.apache.org/jira/browse/HUDI-7971
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: sivabalan narayanan
>            Priority: Major
>             Fix For: 1.0.0
>
>
> Lets ensure 1.x reader is fully compatible w/ reading any of 0.14.x to 0.16.x 
> tables 
>  
> Readers :  1.x
>  # Spark SQL
>  # Spark Datasource
>  # Trino/Presto
>  # Hive
>  # Flink
> Writer: 0.16
> Table State:
>  * COW
>  ** few write commits 
>  ** Pending clustering
>  ** Completed Clustering
>  ** Failed writes with no rollbacks
>  ** Insert overwrite table/partition
>  ** Savepoint for Time-travel query
>  * MOR
>  ** Same as COW
>  ** Pending and completed async compaction (with log-files and no base file)
>  ** Custom Payloads (for MOR snapshot queries) (e:g SQL Expression Payload)
>  ** Log block formats - DELETE, rollback block
> Other knobs:
>  # Metadata enabled/disabled (all combinations)
>  # Column Stats enabled/disabled and data-skipping enabled/disabled
>  # RLI enabled with eq/IN queries
>  # Non-Partitioned dataset (all combinations)
>  # CDC Reads 
>  # Incremental Reads
>  # Time-travel query
>  
> What to test ?
>  # Query Results Correctness
>  # Performance : See the benefit of 
>  # Partition Pruning
>  # Metadata  table - col stats, RLI,
>  
> Corner Case Testing:
>  
>  # Schema Evolution with different file-groups having different generation of 
> schema
>  # Dynamic Partition Pruning
>  # Does Column Projection work correctly for log files reading 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to