steFaiz commented on PR #7812: URL: https://github.com/apache/paimon/pull/7812#issuecomment-4456587782
> We can do this modification: > > * BTreeIndexMeta remains unchanged (compatible with old data) > * In BTreeFileMetaSelector, all comparison operations check whether getFirstKey()/getLastKey() is null before calling deserialize(): > @JingsongLi Thanks for your advise! I think the main complications lie within BTreeIndexReader. Currently, BTreeIndexReader stores minKey and maxKey to simplify many operations. Due to filtering by BTreeMetaSelector, BTreeIndexReader currently assumes that either both minKey and maxKey are null (containing only the nullBitmap) or neither is null. It then uses minKey and maxKey to simplify operations such as lessThan and greaterThan. For example, if a btree file accidentally only stores keys of empty string, it's minKey and maxKey will be deserialized as null from BTreeFileMeta. I think the solution may be do not store minKey and maxKey anymore in BTreeIndexReader, and change the rangeQuery implementation, so that null `from` means starting from the beginning, null `to` means reading to the end. How do you think? I could submit an another PR to fix this in both java, python and rust. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
