moranrr opened a new issue, #5773: URL: https://github.com/apache/paimon/issues/5773
### Search before asking - [x] I searched in the [issues](https://github.com/apache/paimon/issues) and found nothing similar. ### Paimon version paimon:1.1.1 ### Compute Engine flink:1.18.1 spark 3.5.1 ### Minimal reproduce step Flink writes data to table T_1 in real time. The 'bucket' of table T_1 is '80', and 'snapshot. time-retained' is '12 h'. Each partition has approximately 6 billion datas per day, and the distribution is relatively even. When I use the following SparkSQL to count the number of data entries in each partition(dt is the partitioning field): ``` select dt,count(1) from paimon.test.data_detail group by dt ``` 1. If spark.driver.memory is set to 6G, the following error will occur: **Caused by: java.lang.OutOfMemoryError: Java heap space.** Why does counting the number of records based on the date partition field require so much memory resources? 2. If I increase the Spark driver memory, the query becomes extremely slow. After over ten minutes or even half an hour, the following error occurs: ``` Caused by: java.lang.ClassCastException: org.apache.paimon.data.BinaryString cannot be cast to java.lang.Integer ``` ### What doesn't meet your expectations? 1. Why would just counting the data volume in partitions require so much memory resources? 2. Why would a type conversion error occur? ### Anything else? _No response_ ### Are you willing to submit a PR? - [ ] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@paimon.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org