[
https://issues.apache.org/jira/browse/ARROW-13654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joris Van den Bossche updated ARROW-13654:
------------------------------------------
Description:
Writing a tiny parquet file, to read in its metadata (to obtain a FileMetaData
object):
{code}
import pyarrow as pa
import pyarrow.parquet as pq
table = pa.table({'a': [1, 2, 3], 'b': [4, 5, 6]})
pq.write_table(table, "test_file_for_metadata.parquet")
metadata = pq.read_metadata("test_file_for_metadata.parquet")
metadata.append_row_groups(metadata)
{code}
The last line using {{AppendRowGroups}} (appending the metadata object to
itself) keeps running with increasing memory usage (I killed the process when
it was using 10 GB).
This is not something useful to do, but still I wouldn't expect it to blow up
(as one can accidentally do it; I was actually trying it in a attempt to create
a large FileMetaData object).
was:
Writing a tiny parquet file, to read in its metadata (to obtain a FileMetaData
object):
{code}
import pyarrow as pa
import pyarrow.parquet as pq
table = pa.table({'a': [1, 2, 3], 'b': [4, 5, 6]})
pq.write_table(table, "test_file_for_metadata.parquet")
metadata = pq.read_metadata("test_file_for_metadata.parquet")
metadata.append_row_groups(metadata)
{code}
The last line (appending the metadata object to itself) keeps running with
increasing memory usage (I killed the process when it was using 10 GB).
This is not something useful to do, but still I wouldn't expect it to blow up
(as one can accidentally do it; I was actually trying it in a attempt to create
a large FileMetaData object).
> [C++][Parquet] Appending a FileMetaData object to itselfs explodes memory
> -------------------------------------------------------------------------
>
> Key: ARROW-13654
> URL: https://issues.apache.org/jira/browse/ARROW-13654
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Parquet
> Reporter: Joris Van den Bossche
> Priority: Major
>
> Writing a tiny parquet file, to read in its metadata (to obtain a
> FileMetaData object):
> {code}
> import pyarrow as pa
> import pyarrow.parquet as pq
> table = pa.table({'a': [1, 2, 3], 'b': [4, 5, 6]})
> pq.write_table(table, "test_file_for_metadata.parquet")
> metadata = pq.read_metadata("test_file_for_metadata.parquet")
> metadata.append_row_groups(metadata)
> {code}
> The last line using {{AppendRowGroups}} (appending the metadata object to
> itself) keeps running with increasing memory usage (I killed the process when
> it was using 10 GB).
> This is not something useful to do, but still I wouldn't expect it to blow up
> (as one can accidentally do it; I was actually trying it in a attempt to
> create a large FileMetaData object).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)