deniskuzZ commented on PR #5215:
URL: https://github.com/apache/hive/pull/5215#issuecomment-2222272885

   > > i don't know the details, need to check the doc, but. if iceberg exposes 
record_countvia partitions meta-table why select would be expensive, it's just 
1 row fetch with spec filter?
   > 
   > If the table has many manifest files as well as data files, i think 
getting the `record_count` is a little expensive, as the **record_count of 
partition** is the sum of all the `record_count of data files`, so iceberg 
needs to go through all the data file entries in manefest files to get this 
value. But I think we can tolerate this cost in most cases as long as the table 
is not so huge. So we can try do the way as you said, and we can refine it 
further once iceberg repo has done all partition stats api.
   > 
   > > > iceberg partition row count is that in Hive base code we regard 
iceberg table as non-partitioned table, and so some partition prune 
optimization like `HivePartitionPruneRule`
   > > 
   > > 
   > > btw, Hive support partition pruning for iceberg tables
   > 
   > Yes, i guess you are saying 
[HIVE-24962](https://issues.apache.org/jira/browse/HIVE-24962). We do have the 
ability to prune iceberg partitions when scanning data by 
[HIVE-24962](https://issues.apache.org/jira/browse/HIVE-24962). But the 
existing some optimization rules like `HivePartitionPruneRule` is not used by 
iceberg partition table, but it is used by common Hive tables and then will 
determine if the partition table can use `StatsOptimizer::transform` to do 
`count(*) `optimization.
   > 
   > So, maybe need do some magic change to let some optimization rules like 
`HivePartitionPruneRule` know iceberg partition table also have the ability of 
partition prune.
   
   FYI
   https://github.com/apache/iceberg/pull/8502


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to