wgtmac commented on PR #1014:
URL: https://github.com/apache/parquet-mr/pull/1014#issuecomment-1383160117

   > I agree that merging the key-value metadata is not an easy question. Let's 
discuss it separately as it is not related to this PR.
   > 
   > I also agree to store the current writer (parquet-mr) in `created_by` in 
case of rewriting. It is not easy to decide what would be the proper solution 
anyway. `created_by` is usually used for handling potential erroneous writes. 
Let's say there was an issue in parquet-mr at the version 1.2.3 that written a 
specific encoding of integers wrongly (not according to spec). What if we 
rewrite the file but do not re-encode the pages? Can we still handle the 
original issue? What if the rewriter re-encodes the related pages? Let's store 
the original writer in `original.created.by` for now. Let's discuss this topic 
separately however, I am not sure if we can find a proper solution.
   
   I agree. Now I have updated this PR to preserve the old writer version into 
`original.created.by` and added a test to make sure it is preserved. Please 
take a look. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to