Balaji Varadarajan created HUDI-637:
---------------------------------------
Summary: Investigate slower hudi queries in S3 vs HDFS
Key: HUDI-637
URL: https://issues.apache.org/jira/browse/HUDI-637
Project: Apache Hudi (incubating)
Issue Type: Task
Components: Performance
Reporter: Balaji Varadarajan
Fix For: 0.5.2
Hudi queries in S3 takes abnormally longer time compared to AWS.
S3 listing itself is not taking that long of time.
PERFORMANCE BUG:
the metadata list performance is likely causing performance issues with hudi.
{{scala> stopwatch(\{ sql("SELECT * FROM
ap_invoices_all_compacted_s3").count})}}
{{Elapsed time: 1m 55.078473113s
res2: Long = xxxxxxxxxxxx}}
{{}}
{{scala> stopwatch(\{ sql("SELECT * FROM ap_invoices_all_compacted").count})
-- this is the exact same table in hdfs}}
{{Elapsed time: 6.581217052s
res3: Long = xxxxxxxxxxx}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)