Rahil Chertara created HUDI-6876:
------------------------------------

             Summary: Trino Hive Connector: Hudi Metadata Performance Regression
                 Key: HUDI-6876
                 URL: https://issues.apache.org/jira/browse/HUDI-6876
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Rahil Chertara


When using Athena v3 (Trino) to query a hudi table with 10k partitions noticed 
very slow query performance, compared to when disabling metadata feature. 


It seems the bottleneck happens to be around when multiple trino hive thread 
attept reading of HFile data, are stuck in waiting states due to having to 
obtain lock. 


Noticed that disabling following config

```

Disabling {{CACHE_DATA_ON_READ}} for HFileReader in {{HoodieHFileReaderFactory}}

```

can increase query perf when using metadata enabled but will need to do more 
investigation as to whether this has any side effect. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to