DataGamePlay commented on issue #2067:
URL: https://github.com/apache/paimon/issues/2067#issuecomment-3722142630

   ## Reproduce on 0.8.1
   Same issue on **Paimon 0.8.1** 
   ## Problem Description
   Same data, same Hive engine, same query logic:
   - Hive table: 2 minutes
   - Paimon table: 6 minutes (still not returned)
   
   Same data, same Spark SQL:
   - Direct SELECT fields: very slow
   - GROUP BY fields: very fast
   
   3. Query with Hive engine:
   SELECT col1, col2 FROM paimon_table WHERE dt>='2025-01-01';
   -- 6 minutes, still not returned
   
   4. Query with Spark SQL:
   SELECT col1, col2 FROM paimon_table WHERE dt='2025-01-01';
   -- very slow
   
   SELECT col1, col2 FROM paimon_table WHERE dt='2025-01-01' GROUP BY col1, 
col2;
   -- very fast
   
   **Request**:  
   1.Please help analyze why direct SELECT is very slow even after read 
optimization, and why GROUP BY is very fast.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to