[
https://issues.apache.org/jira/browse/ARROW-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17519140#comment-17519140
]
Rok Mihevc commented on ARROW-16147:
------------------------------------
[~emkornfield]
> [C++] ParquetFileWriter doesn't call sink_.Close when using
> GcsRandomAccessFile
> -------------------------------------------------------------------------------
>
> Key: ARROW-16147
> URL: https://issues.apache.org/jira/browse/ARROW-16147
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Reporter: Rok Mihevc
> Priority: Major
> Labels: GCP
>
> On parquet::arrow::FileWriter::Close the underlying sink is not closed. The
> implementation goes to FileSerializer::Close:
> {code:cpp}
> void Close() override {
> if (is_open_) {
> // If any functions here raise an exception, we set is_open_ to be false
> // so that this does not get called again (possibly causing segfault)
> is_open_ = false;
> if (row_group_writer_) {
> num_rows_ += row_group_writer_->num_rows();
> row_group_writer_->Close();
> }
> row_group_writer_.reset();
> // Write magic bytes and metadata
> auto file_encryption_properties =
> properties_->file_encryption_properties();
> if (file_encryption_properties == nullptr) { // Non encrypted file.
> file_metadata_ = metadata_->Finish();
> WriteFileMetaData(*file_metadata_, sink_.get());
> } else { // Encrypted file
> CloseEncryptedFile(file_encryption_properties);
> }
> }
> }
> {code}
> It doesn't call sink_->Close(), which leads to resource leaking and bugs.
> With files (they have own close() in destructor) it works fine, but doesn't
> work with fs::GcsRandomAccessFile. When I calling
> parquet::arrow::FileWriter::Close the data is not flushed to storage, until
> manual close of a sink stream (or stack space change).
> Is it done by intention or a bug?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)