steFaiz commented on PR #7889: URL: https://github.com/apache/paimon/pull/7889#issuecomment-4532233842
@JingsongLi Thanks! 1 & 2 are addressed. Left some explaination for the rest: > 3. Memory pressure from ForceSingleBatchReader wrapping all group readers Yes, this is a problem, and I've mark this as TODO in my original implementation. I think it can be solved by: 1. introducing a `discard` method to discard iterator's next record. In blob, we can just move to the next 2. read all blobs as descriptor. But this may break the optimization introduced in https://github.com/apache/paimon/pull/6989 I'm pleasant to continuously optimize this, but I think can be introduced in future PRs. > 4. Singleton placeholder row reuse in BlobSequenceGroupRecordReader This is safe, because Placeholder will never be read, all method callings will throw an error > 5. BlobFileBunch doesn't validate schemaId across files (by design?) This is intentionally fixed in https://github.com/apache/paimon/pull/7618 > 6. DataEvolutionFileReader contract relaxation This is for only-blob projection situations. If only one blob field is selected, the datafiles num in each blob bunch is still '> 1' because of update. So the code path will still go through the `createUnionReader`. Refactoring the whole code is too noisy in this PR. And this relaxation do not affect correctness of efficiency. It could be optimized in separated PRs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
