petenewcomb opened a new issue, #40630: URL: https://github.com/apache/arrow/issues/40630
### Describe the enhancement requested The Parquet file format allows a file to continue to accumulate row groups after a footer has been written, as long as a new and cumulative footer is written afterward. This is useful if one is writing a stream of data directly to Parquet and need to make sure that that data is fully durable and readable within some time bound. For this purpose I propose a new method `FlushWithFooter` on `file.Writer` that like its sibling `Close` would close any open row group and prepare and write out the file footer. Unlike `Close` it would leave the writer's metadata structures intact, allowing subsequent row groups to be written without starting over, thus ensuring that the metadata written into subsequent footers via `FlushWithFooter` or `Close` is inclusive of all row groups written since the beginning of the file. The alternative, and what is supported today, is to close the open file once the time bound has been reached and start a new one. This works for durability, but is inefficient for readers since they must now open and process the footers of a potentially much larger number of files. The typical workflow is to have a second process "compact" these smaller files to produce larger files that not only consolidate footers but apply other optimizations (such as z-ordering) that holistically reorganize the consolidate data to match observed or expected query patterns. While effective for readers of older data, such compactions take time and significant resources to execute, putting a practical lower bound on the freshness of their outputs. This feature, if adopted, would allow writers to produce data into a modest and predictable number of files within a strict time bound for durability such that readers enjoy that same time bound and modest number of files to efficiently query fresh data without intervening compaction. Compaction would still be recommended, both to apply holistic optimizations and to collapse the extra footers inserted into the original files, but it would be less urgent since compaction would no longer be a constraint on freshness or the manageability of file cardinality. ### Component(s) Go, Parquet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org