raduteo commented on pull request #8130:
URL: https://github.com/apache/arrow/pull/8130#issuecomment-690275695


   RowGroup level file name is certainly supported by fastparquet:
   
https://github.com/dask/fastparquet/blob/0402257560e20b961a517ee6d770e0995e944163/fastparquet/api.py#L187
 
<https://github.com/dask/fastparquet/blob/0402257560e20b961a517ee6d770e0995e944163/fastparquet/api.py#L187>
   
   and the java code does read file_path (again with the one-file-per-rowgroup 
constraint): 
   
https://github.com/apache/parquet-mr/blob/65b95fb72be8f5a8a193a6f7bc4560fdcd742fc7/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L1410
   
   but it’s less clear how it is used (certainly ParquetFileReader class seems 
to stick to a single file)
   
   More than anything the feature is fully in line with the parquet spec 
(unless I misreading something):
   
   
https://github.com/apache/parquet-format/blob/01971a532e20ff8e5eba9d440289bfb753f0cf0b/src/main/thrift/parquet.thrift#L769
   
   Also the code changes are not affecting any of the existing behavior, 
specifically even if one uses the proposed `Snapshot` method during file 
writing, the final file is still readable by the java and the fastparquet 
implementation.
   
   I am happy to open a discussion on the parquet list and push for broader 
support around this feature, but given that it is spec compliant and backward 
compatible with the existing code, I hope we can allow this PR to proceeded 
independently.  
   
   > On Sep 10, 2020, at 1:46 AM, emkornfield <notificati...@github.com> wrote:
   > 
   > 
   > I don't think we should support this unless we can get consensus on 
dev@parquet mailing list that we want to support this across java and C++ (if 
java already supports it a pointer would be useful).
   > 
   > —
   > You are receiving this because you authored the thread.
   > Reply to this email directly, view it on GitHub 
<https://github.com/apache/arrow/pull/8130#issuecomment-690000459>, or 
unsubscribe 
<https://github.com/notifications/unsubscribe-auth/ANYOEGKVOLNFHD7HUQBFW23SFBR2XANCNFSM4Q7GLF4A>.
   > 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to