Micah and all, Thanks for that pointer, I certainly didn’t follow it in detail at the time.
My question/thoughts are actually more limited in scope and I am specifically targeting features supported by the standard AND are supported by other major parquet implementation. Specifically I would like to enable support for the having RowGroups in separate file and (as a side effect) be able to keep metadata in a separate file. This seems to be supported by the spec and by most readers including arrow (at least from scanning the code). If the above are true (or at least not known to be false), it seems like the writer can be modified fairly easily to support that and I am happy to look into making that change. Thoughts? Radu PS: don’t mean to be stubborn by keeping it on the arrow group, but it seems like it is an arrow implementation specific goal. > On Sep 3, 2020, at 6:42 PM, Micah Kornfield <emkornfi...@gmail.com> wrote: > > Hi Radu, > This is a conversation best had on dev@parquet. It came up recently [1] > and I cross-posted there as well. > > [1] > https://lists.apache.org/thread.html/re4fe4bc80c9eadd446761588f9b03d827193f91269a7c14ce0c444dd%40%3Cdev.arrow.apache.org%3E > > On Thu, Sep 3, 2020 at 3:20 PM Radu Teodorescu <radukay...@yahoo.com.invalid> > wrote: > >> Hello, >> What is the current thinking around allowing the logical content of a >> parquet file to be split across multiple files? >> I see that in theory there is support for reading files where different >> row groups are in separate files but I cannot see any features that allow >> that for writing. >> >> On a somewhat related note, what are the thoughts on supporting parquet >> file append mode? >> Specifically if the meatadata is stored in a standalone file one can >> easily add new row groups to an existing file and create a new version of >> the metadata file without affecting potential consumers of the existing >> data. >> >> >>