[
https://issues.apache.org/jira/browse/HUDI-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Voon Hou updated HUDI-9767:
---------------------------
Description:
Upstreaming a bunch of Trino improvements to Hudi-Trino and they are:
# Increase Default MaxOutstandingSplits and SplitLoaderParallelism
# HUDI-9525 Extend file system cache support to Hudi connector
## Note that all NON trino-hudi-plugin related code changes are excluded in
the cherrypick
# Fix split generation parallelism for a non-partitioned table
# -[MINOR] Disable PR labeler- (Skipped as this is modifies a file in
.github/workflows)
# -Add parquet page skipping in Iceberg Connector- (Skipped as this is for
testing purposes)
# -HUDI-9577 Make target result size configurable from endpoint and server-
(Skipped as all changes made are in non-hudi related trino modules)
# -Create pipeline to build arm image- (Skipped as this modifies github
actions)
# [MINOR] Revert changes made to non-hudi modules
# [MINOR] Added optimizations for HudiColumnStatsIndexSupport
# [MINOR] Cleanup HudiSplitFactory to extend cache support
# Fix flakiness when testing cache correctness
# [Trino] Enable Metadata Table by default
# [Trino] Fix flaky tests due to table stats computation lagging behind query
execution
# Implement Metadata table based Partition listing
## Changes in trino-hive are ignored
# Fix Case Sensitivity Issues Between Table and Catalog Schemas
# [Trino] Workers should use latest commit time from table handle
# Incorrect query results for Merge-On-Read (RT) tables when column stats are
enabled
# Move partition parsing from Trino-Hive to Trino-Hudi module
# [MINOR] Increase async index loading to 10s to reduce probability of flaky
tests
Upstream *STARTS* at commit hash (inclusive):
ea7f22d0371173a31be0c693a24fa00b7374fe0f
Upstream *ENDS* at commit hash (inclusive):
cc62560c6d648fd7b70eb2a1b96e77d96fb0abb9
was:
Upstreaming a bunch of Trino improvements to Hudi-Trino and they are:
# Increase Default MaxOutstandingSplits and SplitLoaderParallelism
# HUDI-9525 Extend file system cache support to Hudi connector
## Note that all NON trino-hudi-plugin related code changes are excluded in
the cherrypick
# Fix split generation parallelism for a non-partitioned table
# -[MINOR] Disable PR labeler- (Skipped as this is modifies a file in
.github/workflows)
# -Add parquet page skipping in Iceberg Connector- (Skipped as this is for
testing purposes)
# -HUDI-9577 Make target result size configurable from endpoint and server-
(Skipped as all changes made are in non-hudi related trino modules)
# -Create pipeline to build arm image- (Skipped as this modifies github
actions)
# [MINOR] Revert changes made to non-hudi modules
# [MINOR] Added optimizations for HudiColumnStatsIndexSupport
# [MINOR] Cleanup HudiSplitFactory to extend cache support
# Fix flakiness when testing cache correctness
# [Trino] Enable Metadata Table by default
# [Trino] Fix flaky tests due to table stats computation lagging behind query
execution
# Implement Metadata table based Partition listing
## Changes in trino-hive are ignored
# Fix Case Sensitivity Issues Between Table and Catalog Schemas
# [Trino] Workers should use latest commit time from table handle
# Incorrect query results for Merge-On-Read (RT) tables when column stats are
enabled
# Move partition parsing from Trino-Hive to Trino-Hudi module
Upstream *STARTS* at commit hash (inclusive):
ea7f22d0371173a31be0c693a24fa00b7374fe0f
Upstream *ENDS* at commit hash (inclusive):
cc62560c6d648fd7b70eb2a1b96e77d96fb0abb9
> Upstream Trino Improvements to Hudi-Trino
> -----------------------------------------
>
> Key: HUDI-9767
> URL: https://issues.apache.org/jira/browse/HUDI-9767
> Project: Apache Hudi
> Issue Type: Task
> Reporter: Voon Hou
> Assignee: Voon Hou
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.1.0
>
>
> Upstreaming a bunch of Trino improvements to Hudi-Trino and they are:
>
> # Increase Default MaxOutstandingSplits and SplitLoaderParallelism
> # HUDI-9525 Extend file system cache support to Hudi connector
> ## Note that all NON trino-hudi-plugin related code changes are excluded in
> the cherrypick
> # Fix split generation parallelism for a non-partitioned table
> # -[MINOR] Disable PR labeler- (Skipped as this is modifies a file in
> .github/workflows)
> # -Add parquet page skipping in Iceberg Connector- (Skipped as this is for
> testing purposes)
> # -HUDI-9577 Make target result size configurable from endpoint and server-
> (Skipped as all changes made are in non-hudi related trino modules)
> # -Create pipeline to build arm image- (Skipped as this modifies github
> actions)
> # [MINOR] Revert changes made to non-hudi modules
> # [MINOR] Added optimizations for HudiColumnStatsIndexSupport
> # [MINOR] Cleanup HudiSplitFactory to extend cache support
> # Fix flakiness when testing cache correctness
> # [Trino] Enable Metadata Table by default
> # [Trino] Fix flaky tests due to table stats computation lagging behind
> query execution
> # Implement Metadata table based Partition listing
> ## Changes in trino-hive are ignored
> # Fix Case Sensitivity Issues Between Table and Catalog Schemas
> # [Trino] Workers should use latest commit time from table handle
> # Incorrect query results for Merge-On-Read (RT) tables when column stats
> are enabled
> # Move partition parsing from Trino-Hive to Trino-Hudi module
> # [MINOR] Increase async index loading to 10s to reduce probability of flaky
> tests
>
>
> Upstream *STARTS* at commit hash (inclusive):
> ea7f22d0371173a31be0c693a24fa00b7374fe0f
> Upstream *ENDS* at commit hash (inclusive):
> cc62560c6d648fd7b70eb2a1b96e77d96fb0abb9
--
This message was sent by Atlassian Jira
(v8.20.10#820010)