Re: [I] [Python][Parquet] Memory leak still showed on parquet.write_table and Table.from_pandas [arrow]

via GitHub Wed, 27 Mar 2024 22:53:47 -0700


guozhans commented on issue #40738:
URL: https://github.com/apache/arrow/issues/40738#issuecomment-2024456764


   Hi @kyle-ip,
   
   I had Arrow 14.0.0 and 16.0.0 DEV version installed in different folders 
before, and i am not aware of the old version until that day. I removed Arrow 
14.0.0 complete from my ubuntu docker, and re-build source froma main branch 
again with these commands. And then re-install PyArrow 16.0.0 dev (I know i can 
build it as well, but i am bit lazy). Now everything looks fine now.
   ```shell
   cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
                  -DCMAKE_INSTALL_LIBDIR=lib \
                  -DCMAKE_BUILD_TYPE=Release \
                  -DARROW_BUILD_TESTS=ON \
                  -DARROW_COMPUTE=ON \
                  -DARROW_CSV=ON \
                  -DARROW_DATASET=ON \
                  -DARROW_FILESYSTEM=ON \
                  -DARROW_HDFS=ON \
                  -DARROW_JSON=ON \
                  -DARROW_PARQUET=ON \
                  -DARROW_WITH_BROTLI=ON \
                  -DARROW_WITH_BZ2=ON \
                  -DARROW_WITH_LZ4=ON \
                  -DARROW_WITH_SNAPPY=ON \
                  -DARROW_WITH_ZLIB=ON \
                  -DARROW_WITH_ZSTD=ON \
                  -DPARQUET_REQUIRE_ENCRYPTION=ON \
                  .. \
       && make -j4 \
       && make install
   ```
   
   
   Result:
   ```shell
   Line #    Mem usage    Increment  Occurrences   Line Contents
   =============================================================
       29    395.4 MiB    395.4 MiB           1   @profile
       30                                         def to_parquet(df: 
pd.DataFrame, filename: str):
       31    372.2 MiB    -23.2 MiB           1       table = 
Table.from_pandas(df)
       32    372.2 MiB      0.0 MiB           1       pool = 
pa.default_memory_pool()
       33    396.4 MiB     24.2 MiB           1       
parquet.write_table(table, filename, compression="snappy")
       34    396.4 MiB      0.0 MiB           1       del table
       35    396.4 MiB      0.0 MiB           1       pool.release_unused()
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Python][Parquet] Memory leak still showed on parquet.write_table and Table.from_pandas [arrow]

Reply via email to