steFaiz commented on PR #7812:
URL: https://github.com/apache/paimon/pull/7812#issuecomment-4456587782

   > We can do this modification:
   > 
   > * BTreeIndexMeta remains unchanged (compatible with old data)
   > * In BTreeFileMetaSelector, all comparison operations check whether 
getFirstKey()/getLastKey() is null before calling deserialize():
   >   
   
   @JingsongLi  Thanks for your advise! I think the main complications lie 
within BTreeIndexReader. Currently, BTreeIndexReader stores minKey and maxKey 
to simplify many operations. Due to filtering by BTreeMetaSelector, 
BTreeIndexReader currently assumes that either both minKey and maxKey are null 
(containing only the nullBitmap) or neither is null. It then uses minKey and 
maxKey to simplify operations such as lessThan and greaterThan.
   
   For example, if a btree file accidentally only stores keys of empty string, 
it's minKey and maxKey will be deserialized as null from BTreeFileMeta.
   
   I think the solution may be do not store minKey and maxKey anymore in 
BTreeIndexReader, and change the rangeQuery implementation, so that null `from` 
means starting from the beginning, null `to` means reading to the end. 
   
   How do you think? I could submit an another PR to fix this in both java, 
python and rust.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to