Victor created PARQUET-1559:
-------------------------------

             Summary: Add way to manually commit already written data to disk
                 Key: PARQUET-1559
                 URL: https://issues.apache.org/jira/browse/PARQUET-1559
             Project: Parquet
          Issue Type: Improvement
          Components: parquet-mr
    Affects Versions: 1.10.1
            Reporter: Victor


I'm not exactly sure this is compliant with the way parquet works, but I have 
the following need:
 * I'm using parquet-avro to write to a parquet file during a long running 
process
 * I would like to be able from time to time to access the already written data

So I was expecting to be able to flush manually the file to ensure the data is 
on disk and then copy the file for preliminary analysis.

If it's contradictory to the way parquet works (for example there is something 
about metadata being at the footer of the file), what would then be the 
alternative?

Closing the file and opening a new one to continue writing?

Could this be supported directly by parquet-mr maybe? It would then write 
multiple files in that case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to