suxiaogang223 opened a new issue, #3313: URL: https://github.com/apache/paimon/issues/3313
### Search before asking - [X] I searched in the [issues](https://github.com/apache/paimon/issues) and found nothing similar. ### Paimon version 0.8-SNAPSHOT ### Compute Engine JavaAPI ### Minimal reproduce step Nothing to do ### What doesn't meet your expectations? I'm trying to support deletion vector for doris' PaimonNativeReader. When I follow the offset and length in DeletionFile to read the content of the hdfs file to the local, I got an error when deserializing the content into RoaringBitmap, actually I found that the correct way is to read the content of length + 4 bytes to local. I guess that these 4 bytes are due to saving the serialized length of the DeletionVector when storing DeletionVector to index file. ```java static DeletionVector read(FileIO fileIO, DeletionFile deletionFile) throws IOException { Path path = new Path(deletionFile.path()); try (SeekableInputStream input = fileIO.newInputStream(path)) { input.seek(deletionFile.offset()); DataInputStream dis = new DataInputStream(input); int actualLength = dis.readInt(); if (actualLength != deletionFile.length()) { throw new RuntimeException( "Size not match, actual size: " + actualLength + ", expert size: " + deletionFile.length() + ", file path: " + path); } int magicNum = dis.readInt(); if (magicNum == BitmapDeletionVector.MAGIC_NUMBER) { return BitmapDeletionVector.deserializeFromDataInput(dis); } else { throw new RuntimeException("Invalid magic number: " + magicNum); } } } ``` Maybe we should add 4 to length or offset in DeletionFile because it's very confusing. ### Anything else? _No response_ ### Are you willing to submit a PR? - [X] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
