[ https://issues.apache.org/jira/browse/ARROW-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Antoine Pitrou resolved ARROW-11855. ------------------------------------ Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 9626 [https://github.com/apache/arrow/pull/9626] > [C++] [Python] Memory leak in to_pandas when converting chunked struct array > ---------------------------------------------------------------------------- > > Key: ARROW-11855 > URL: https://issues.apache.org/jira/browse/ARROW-11855 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Reporter: Weston Pace > Assignee: Weston Pace > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Reproduction from [~shadowdsp] > {code:java} > import io > import pandas as pd > import pyarrow as pa > pa.jemalloc_set_decay_ms(0) > import pyarrow.parquet as pq > from memory_profiler import profile > @profile > def read_file(f): > table = pq.read_table(f) > df = table.to_pandas(strings_to_categorical=True) > del table > del df > def main(): > rows = 2000000 > df = pd.DataFrame({ > "string": [{"test": [1, 2], "test1": [3, 4]}] * rows, > "int": [5] * rows, > "float": [2.0] * rows, > }) > table = pa.Table.from_pandas(df, preserve_index=False) > parquet_stream = io.BytesIO() > pq.write_table(table, parquet_stream) > for i in range(3): > parquet_stream.seek(0) > read_file(parquet_stream) > if __name__ == '__main__': > main() > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)