MOBIN-F opened a new issue, #3676: URL: https://github.com/apache/paimon/issues/3676
### Search before asking - [X] I searched in the [issues](https://github.com/apache/paimon/issues) and found nothing similar. ### Paimon version paimon-spark-3.3-0.8 ### Compute Engine Spark 3.3.2 ### Minimal reproduce step none ### What doesn't meet your expectations? We have a Paimon primary key table and a non-Paimon table with the same data. We found that in the query [where pt=20240530 limit 10], the Paimon primary key table is much slower than the non-Paimon table. **paimon-pk table:** paimon TBLPROPERTIES ``` "options" : { "bucket" : "1", "num-sorted-run.stop-trigger" : "2147483647", "changelog-producer" : "none", "snapshot.num-retained.max" : "3", "snapshot.num-retained.min" : "1", "sink.parallelism" : "5", "deletion-vectors.enabled" : "true", "compaction.optimization-interval" : "10", "sort-spill-threshold" : "10" }, ``` [select * from paimon_catalog.rt_ods.paimon_xxxx_d where pt=20240530 limit 10]    ``` $hadoop fs -ls -t -r /warehouse/rt_ods/paimon_uoc_order_main_d/pt=20240530/* Found 5 items -rwxrwx--x+ 2 hive supergroup 134829818 2024-07-04 23:53 /warehouse/rt_ods/paimon_xxxx_d/pt=20240530/bucket-0/data-c7aa5c7d-0f77-4567-abc3-066bdbd677b1-2.orc -rwxrwx--x+ 2 hive supergroup 135744147 2024-07-05 08:03 /warehouse/rt_ods/paimon_xxxx_d/pt=20240530/bucket-0/data-c7aa5c7d-0f77-4567-abc3-066bdbd677b1-11.orc -rwxrwx--x+ 2 hive supergroup 134621349 2024-07-05 08:50 /warehouse/rt_ods/paimon_xxxx_d/pt=20240530/bucket-0/data-c7aa5c7d-0f77-4567-abc3-066bdbd677b1-12.orc -rwxrwx--x+ 2 hive supergroup 49324398 2024-07-05 08:50 /warehouse/rt_ods/paimon_xxxx_d/pt=20240530/bucket-0/data-c7aa5c7d-0f77-4567-abc3-066bdbd677b1-13.orc -rwxrwx--x+ 2 hive supergroup 8976 2024-07-05 09:26 /warehouse/rt_ods/paimon_xxxx_d/pt=20240530/bucket-0/data-39bcc9ec-bdac-4e1a-96bb-1e8a2ec96b3d-10.orc ``` ``` $hadoop fs -du -s -h /warehouse/rt_ods/paimon_xxxx_d/pt=20240530/ 433.5 M 866.9 M /warehouse/rt_ods/paimon_xxxx_d/pt=20240530 ``` count(1) where pt=20240530  **non-Paimon table (parquet format):** [select * from dw_ods.tdb_xxxx_d where pt=20240530 limit 10]   ``` $hadoop fs -du -s -h /ods/tdb_xxxx_d/pt=20240530 846.4 M 1.7 G /ods/tdb_xxxx_d/pt=20240530 ``` When the file size and number of entries are similar, the limit query performance of paimon seems to be lower than that of non-paimon tables, as if limit does not work? ### Anything else? _No response_ ### Are you willing to submit a PR? - [ ] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
