[GitHub] [iceberg] szehon-ho edited a comment on issue #2326: Partition Table Performance

GitBox Sat, 13 Mar 2021 14:45:22 -0800


szehon-ho edited a comment on issue #2326:
URL: https://github.com/apache/iceberg/issues/2326#issuecomment-798797152



   Nice, I think with aggregate pushdown it could potentially open many faster 
queries in Iceberg overall, given how much metadata we have, if we are ok to 
answer queries with them.
   
   I think #2182 is interesting, but it might be true that it's not worth the 
cost if it adds a ton more time for each commit.
   
   @aokolnychyi  I gave a try today for doing the equivalent query on the files 
table, and it's much faster (~minute vs ~10s of minutes).  I even added 
distinct in the end and it does not add much time.  It's a shame that users 
first try partitions table and not files table then for listing partitions.  I 
guess there's not much we can do unless we have the spark view catalog support 
as mentioned in that PR?
   
   By the way, as Russell was pointing to me, I was looking at making an 
improvement by adding predicate pushdown using ManifestEvaluator to filter out 
manifest-files, as the Manifest List already has each manifest-file's partition 
min/max.  If I understand correctly, it requires converting a filter on the 
"partition" table (partition.part_field = x) to a "partition filter" 
(part_field = x).  Now it makes me wonder if this functionality will be 
compatible with later rewriting partitions table to use view of files table.  I 
guess it would be, as we should have an equivalent partition filter on 
files_table as well (partition.part_field = x, which is not there today).
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] szehon-ho edited a comment on issue #2326: Partition Table Performance

Reply via email to