Micah and all,
Thanks for that pointer, I certainly didn’t follow it in detail at the time.

My question/thoughts are actually more limited in scope and I am specifically 
targeting features supported by the standard AND are supported by other major 
parquet implementation.

Specifically I would like to enable support for the having RowGroups in 
separate file and (as a side effect) be able to keep metadata in a separate 
file. 
This seems to be supported by the spec and by most readers including arrow (at 
least from scanning the code).

If the above are true (or at least not known to be false), it seems like the 
writer can be modified fairly easily to support that and I am happy to look 
into making that change.

Thoughts?
Radu

PS: don’t mean to be stubborn by keeping it on the arrow group, but it seems 
like it is an arrow implementation specific goal.





> On Sep 3, 2020, at 6:42 PM, Micah Kornfield <emkornfi...@gmail.com> wrote:
> 
> Hi Radu,
> This is a conversation best had on dev@parquet.  It came up recently [1]
> and I cross-posted there as well.
> 
> [1]
> https://lists.apache.org/thread.html/re4fe4bc80c9eadd446761588f9b03d827193f91269a7c14ce0c444dd%40%3Cdev.arrow.apache.org%3E
> 
> On Thu, Sep 3, 2020 at 3:20 PM Radu Teodorescu <radukay...@yahoo.com.invalid>
> wrote:
> 
>> Hello,
>> What is the current thinking around allowing the logical content of a
>> parquet file to be split across multiple files?
>> I see that in theory there is support for reading files where different
>> row groups are in separate files but I cannot see any features that allow
>> that for writing.
>> 
>> On a somewhat related note, what are the thoughts on supporting parquet
>> file append mode?
>> Specifically if the meatadata is stored in a standalone file one can
>> easily add new row groups to an existing file and create a new version of
>> the metadata file without affecting potential consumers of the existing
>> data.
>> 
>> 
>> 

Reply via email to