gaborkaszab commented on PR #14508:
URL: https://github.com/apache/iceberg/pull/14508#issuecomment-3491094392

   Some background: The current the way to query partition stats is through 
`PartitionStatsHandler.readPartitionStatsFile()`. For the the user has to put 
together the schema and get the input file to read. It would be beneficial for 
easier usability (also one comment on [my stats proposal 
doc](https://docs.google.com/document/d/1H9uYt53Q1_CcOXOfLcr0hXRxvqflg_k_xeVorMLrWbM/edit?pli=1&disco=AAABoDRdYcw)
 mentions)  to have a more convenient API to scan partition stats. This could 
also have filter and projection capabilities.
   
   The content of this PR:
   1) Introduce `PartitionStatisticsScan` API and its implementation 
`BasePartitionStatisticsScan` in core. For simplicity this has the 
functionality that exists today, no filtering by partition, no projection.
   2) Replace the usage of `PartitionStatsHandler.readPartitionStatsFile()` 
with the new API
   3) Introduce PartitionStatistics interface into the API module, make 
PartitionStats in core to derive from this. This is needed so that the Scan API 
could use this as return value, while the existing PartitionStats class is in 
core module.
   4) Replace the usage of PartitionStats whenever possible with the new 
interface.
   
   These could possibly be some follow-up steps:
   1) Implementation of filter() and project() on the new Scan API
   2) The naming of affected classes is a bit weird: interface 
`api/PartitionStatistics` that is implemented by `core/PartitionStats`. Ideally 
the name of the implementation would be `BasePartitionStatistics`. As a next 
step we can introduce a class with the same content and new name and deprecate 
the existing one, also remove usage. Changes within PartitionStats are easier 
to review in case "renaming" happens in a follow-up PR.
   3) Older Spark versions should be covered


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to