wgtmac commented on PR #1014:
URL: https://github.com/apache/parquet-mr/pull/1014#issuecomment-1383101451

   > > I am afraid some implementations may drop characters after `'\n'` when 
displaying the string content. Let me do some investigation.
   > 
   > I do not have a strong opinion for `'\n'` only that we need a character 
that probably won't be used by any systems writing parquet files.
   
   As we are discussing a new entry (`original.created.by`) to the key value 
metadata, I need to raise two related issues once we have supported rewriting 
(merging) several files into one:
   - We need to merge `original.created.by` from all input files, making it 
difficult to tell which created_by comes from which input file. Therefore, 
`original.created.by` should be dropped in this case.
   - Is there any key value metadata that will conflict from different input 
files and should be dealt with by the rewriter? For now we simply keep all the 
old key value metadata from the old file.
   
   @gszadovszky @ggershinsky @shangxinli Thoughts?
   
   If this behavior requires further discussion, I'd suggest to keep the 
current state of `created_by` unchanged in this pull request which is large 
enough. All rewriters (ColumnPruner, CompressionConverter, ColumnMasker, and 
ColumnEncrypter) have dropped original `created_by` and store the current 
writer version to the footer.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to