Re: [PR] [core] introduce Placeholder for Blob File Format [paimon]

via GitHub Mon, 25 May 2026 00:01:04 -0700


steFaiz commented on PR #7889:
URL: https://github.com/apache/paimon/pull/7889#issuecomment-4532233842


   @JingsongLi  Thanks! 1 & 2 are addressed. Left some explaination for the 
rest:
   > 3. Memory pressure from ForceSingleBatchReader wrapping all group readers
   
   Yes, this is a problem, and I've mark this as TODO in my original 
implementation. I think it can be solved by:
   1. introducing a `discard` method to discard iterator's next record. In 
blob, we can just move to the next
   2. read all blobs as descriptor. But this may break the optimization 
introduced in https://github.com/apache/paimon/pull/6989
   
   I'm pleasant to continuously optimize this, but I think can be introduced in 
future PRs.
   
   > 4. Singleton placeholder row reuse in BlobSequenceGroupRecordReader
   
   This is safe, because Placeholder will never be read, all method callings 
will throw an error
   
   > 5. BlobFileBunch doesn't validate schemaId across files (by design?)
   
   This is intentionally fixed in https://github.com/apache/paimon/pull/7618
   
   > 6. DataEvolutionFileReader contract relaxation
   
   This is for only-blob projection situations. If only one blob field is 
selected, the datafiles num in each blob bunch is still
   '> 1' because of update. So the code path will still go through the 
`createUnionReader`.
   Refactoring the whole code is too noisy in this PR. And this relaxation do 
not affect correctness of efficiency. It could be optimized in separated PRs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [core] introduce Placeholder for Blob File Format [paimon]

Reply via email to