Victor created PARQUET-1559:
-------------------------------
Summary: Add way to manually commit already written data to disk
Key: PARQUET-1559
URL: https://issues.apache.org/jira/browse/PARQUET-1559
Project: Parquet
Issue Type: Improvement
Components: parquet-mr
Affects Versions: 1.10.1
Reporter: Victor
I'm not exactly sure this is compliant with the way parquet works, but I have
the following need:
* I'm using parquet-avro to write to a parquet file during a long running
process
* I would like to be able from time to time to access the already written data
So I was expecting to be able to flush manually the file to ensure the data is
on disk and then copy the file for preliminary analysis.
If it's contradictory to the way parquet works (for example there is something
about metadata being at the footer of the file), what would then be the
alternative?
Closing the file and opening a new one to continue writing?
Could this be supported directly by parquet-mr maybe? It would then write
multiple files in that case.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)