Dear community, Nice to share Hudi community bi-weekly updates for 2021-05-09 ~ 2021-05-22 with updates on features, bug fixes and tests.
======================================= Features [Flink Integration] Avoid to generates corrupted files for flink sink [1] [Core] Support reading older snapshots [2] [Flink Integration] Global index for flink writer [3] [Flink Integration] Reuse the partition path and file group id for flink write data buffer [4] [1] https://issues.apache.org/jira/browse/HUDI-1886 [2] https://issues.apache.org/jira/browse/HUDI-1789 [3] https://issues.apache.org/jira/browse/HUDI-1902 [4] https://issues.apache.org/jira/browse/HUDI-1911 ======================================= Bugs [Core] Reduces log level for too verbose messages from info to debug level [1] [Flink Integration] FlinkCreateHandle and FlinkAppendHandle canWrite should always return true[2] [Flink Integration] Validate required fields for Flink HoodieTable [3] [Flink Integration] Close the file handles gracefully for flink write function to avoid corrupted files [4] [Spark Integration] Fix hive beeline/spark-sql query specified field on mor table occur NPE [5] [Flink Integration] Always close the file handle for a flink mini-batch write [6] [Flink Integration] Support skip bootstrapIndex's init in abstract fs view init [7] [Flink Integration] Clean the corrupted files generated by FlinkMergeAndReplaceHandle [8] [Hive Integratoin] Honoring skipROSuffix in spark ds [9] [Core] Using streams instead of loops for input/output [10] [Flink Integration] Fix the file id for write data buffer before flushing [11] [Flink Integration] Fix hive conf for Flink writer hive meta sync [12] [Hive Integration] hive on spark/mr,Incremental query of the mor table, the partition field is incorrect [13] [Flink Integration] Remove the metadata sync logic in HoodieFlinkWriteClient#preWrite because it is not thread safe [14] [Core] Fix NPE when the nested partition path field has null value [15] [Flink Integration] Fix incorrect keyBy field cause serious data skew, to avoid multiple subtasks write to a partition at the same time [16] [Core] Fix insert-overwrite API archival [17] [1] https://issues.apache.org/jira/browse/HUDI-1707 [2] https://issues.apache.org/jira/browse/HUDI-1890 [3] https://issues.apache.org/jira/browse/HUDI-1818 [4] https://issues.apache.org/jira/browse/HUDI-1895 [5] https://issues.apache.org/jira/browse/HUDI-1722 [6] https://issues.apache.org/jira/browse/HUDI-1900 [7] https://issues.apache.org/jira/browse/HUDI-1446 [8] https://issues.apache.org/jira/browse/HUDI-1876 [9] https://issues.apache.org/jira/browse/HUDI-1806 [10] https://issues.apache.org/jira/browse/HUDI-1913 [11] https://issues.apache.org/jira/browse/HUDI-1915 [12] https://issues.apache.org/jira/browse/HUDI-1871 [13] https://issues.apache.org/jira/browse/HUDI-1719 [14] https://issues.apache.org/jira/browse/HUDI-1917 [15] https://issues.apache.org/jira/browse/HUDI-1888 [16] https://issues.apache.org/jira/browse/HUDI-1918 [17] https://issues.apache.org/jira/browse/HUDI-1740 ====================================== Tests [Tests] Adding test suite long running automate scripts for docker [1] [Tests] Remove hardcoded parquet in tests [2] [Tests] add spark datasource unit test for schema validate add column [3] [1] https://issues.apache.org/jira/browse/HUDI-1851 [2] https://issues.apache.org/jira/browse/HUDI-1055 [3] https://issues.apache.org/jira/browse/HUDI-1768 Best, Leesf