adawrapub commented on PR #14351: URL: https://github.com/apache/iceberg/pull/14351#issuecomment-3483128718
> > Hello @nastra Thank you for your response and patch. I did apply all your changes which caused some of the test cases to fail. My understanding is Format v3+ uses PUFFIN format for position deletes. The positionDeletesReader method doesn't support PUFFIN format. Hence in test cases like testPositionDeleteWithRow, testPositionDeletes, testDeleteFrom, I added assumeThat(formatVersion) to not run for v3 and v4. Let me know if there is a better way to fix this. I did update FileHelpers.writePosDeleteFile and FileHelpers.writeDeleteFile to take format version. > > Those remaining failing tests is the indicator of the underlying issue that needs to be fixed as part of this PR. We don't want to skip those tests for v3+ but we need to make rewriting table paths work for v3+, which is the scope of #13671 and which I also mentioned in [#14226 (comment)](https://github.com/apache/iceberg/pull/14226#issuecomment-3354917294). You need to apply a similar approach to what has been done in #11657 in order to allow reading v3 deletes and thus make those tests pass DV files (Puffin format) store the referenced data file path in two separate locations: Manifest Metadata: DeleteFile.referencedDataFile() field Puffin Blob Metadata: "referenced-data-file" property inside the blob The original implementation only updated the manifest metadata. The Puffin blob metadata still contained the old path, causing the DV reader to fail when applying deletes at the new location. I have implemented a two-pronged update strategy. Please let me know if there is a better way 1. Manifest Metadata Update Added ContentFileUtil.replaceReferencedDataFile() utility method Created RewriteTablePathUtil.newPositionDeleteEntry() helper method Updates the referencedDataFile field in manifest entries 2. Puffin Content Update Implemented RewriteTablePathUtil.rewriteDVFile() method Reads Puffin files, updates blob metadata properties, writes new files Preserves the bitmap data while updating path references -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
