adawrapub commented on PR #14351:
URL: https://github.com/apache/iceberg/pull/14351#issuecomment-3483128718

   > > Hello @nastra Thank you for your response and patch. I did apply all 
your changes which caused some of the test cases to fail. My understanding is 
Format v3+ uses PUFFIN format for position deletes. The positionDeletesReader 
method doesn't support PUFFIN format. Hence in test cases like 
testPositionDeleteWithRow, testPositionDeletes, testDeleteFrom, I added 
assumeThat(formatVersion) to not run for v3 and v4. Let me know if there is a 
better way to fix this. I did update FileHelpers.writePosDeleteFile and 
FileHelpers.writeDeleteFile to take format version.
   > 
   > Those remaining failing tests is the indicator of the underlying issue 
that needs to be fixed as part of this PR. We don't want to skip those tests 
for v3+ but we need to make rewriting table paths work for v3+, which is the 
scope of #13671 and which I also mentioned in [#14226 
(comment)](https://github.com/apache/iceberg/pull/14226#issuecomment-3354917294).
 You need to apply a similar approach to what has been done in #11657 in order 
to allow reading v3 deletes and thus make those tests pass
   
   DV files (Puffin format) store the referenced data file path in two separate 
locations:
   Manifest Metadata: DeleteFile.referencedDataFile() field
   Puffin Blob Metadata: "referenced-data-file" property inside the blob
   The original implementation only updated the manifest metadata. The Puffin 
blob metadata still contained the old path, causing the DV reader to fail when 
applying deletes at the new location.
   
   I have implemented a two-pronged update strategy. Please let me know if 
there is a better way
   
   1. Manifest Metadata Update
   Added ContentFileUtil.replaceReferencedDataFile() utility method
   Created RewriteTablePathUtil.newPositionDeleteEntry() helper method
   Updates the referencedDataFile field in manifest entries
   2. Puffin Content Update
   Implemented RewriteTablePathUtil.rewriteDVFile() method
   Reads Puffin files, updates blob metadata properties, writes new files
   Preserves the bitmap data while updating path references


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to