shangxinli commented on pull request #1566: URL: https://github.com/apache/iceberg/pull/1566#issuecomment-721809179
@rdblue, make sense. Now the question is for ColumnIndex filter do we also want to reimplement? And the same question for future bloomfilter. Since we already had other types of filter reimplemented, it makes more sense to do the same thing. It is just more effort to do, but I can do it if Iceberg decides to reimplement all the filters in Parquet. If we don't reimplement, we might need Parquet to release 1.11.2(we discussed in the last Parquet meeting and that is possible if needed) with a fix to have skipped record count. But we still have the same issue you mentioned above(IN, STARTWITH not supported etc). @shardulm94, I didn't address your comments yet. If we decide to reimplement the ColumnIndex filter, then we cannot call readNextRowGroupFilter(). Instead, we will filter pages(maybe in BaseColumnIterator?, need to look into it more). So I will address later once we have agreement here. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
