Philip Felton created PARQUET-1374:
--------------------------------------
Summary: Segfault on writing zero columns
Key: PARQUET-1374
URL: https://issues.apache.org/jira/browse/PARQUET-1374
Project: Parquet
Issue Type: Bug
Reporter: Philip Felton
Here's a gist which reproduces it:
[https://gist.github.com/philjdf/594ab431f135a040586aff08c7fb7666]
# The problem starts with the call to ParquetFileWriter::Close().
# As a result of that call,
FileMetaDataBuilder::FileMetaDataBuilderImpl::Finish() gets called, which
relies on metadata_ being non-null. At the end of that call Finish, it
std::moves metadata_ somewhere else, setting it to null. So obviously it
assumes it only gets called once.
# Later on still inside Close(), FlatSchemaConverter::Convert() gets called,
which throws an exception because we have no columns.
# In handling this exception, we leave the try block, which destructs our
ParquetFileWriter. This calls Close() again. This calls Finish() again, which
now has a null metadata_ and segfaults.
So file_writer.cc FileSerializer::Close is presumably wrong, it should set
is_open_ to false at the start rather than the end of the if block.
It's better to get an exception rather than a segfault, but ideally I'd like to
write/read Parquet files with zero rows and/or zero columns. It means one less
edge case for client code.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)