[ https://issues.apache.org/jira/browse/ARROW-17473 ]
Sasha Sirovica deleted comment on ARROW-17473:
----------------------------------------
was (Author: JIRAUSER294638):
cc [~zeroshade] I see you've been doing some great work in this area! You might
find this interesting.
I have been looking into this issue myself but so far have not found the root
cause
> [Go] String Binary Builder Leaks Memory When Writing to Parquet
> ---------------------------------------------------------------
>
> Key: ARROW-17473
> URL: https://issues.apache.org/jira/browse/ARROW-17473
> Project: Apache Arrow
> Issue Type: Bug
> Components: Go
> Affects Versions: 9.0.0
> Environment: Mac
> Reporter: Sasha Sirovica
> Priority: Major
>
> When using `arrow.BinaryTypes.String` in a schema, appending multiple
> strings, and then writing a record out to parquet the memory of the program
> continuously increases.
>
> I took a heap dump on my computer midway through the program and the majority
> of allocations comes from `StringBuilder.Append`. I approached 16GB of RAM
> before terminating the program.
>
> I was not able to replicate this behavior with just PrimativeTypes. Another
> interesting point, if the records are created but never written with pqarrow
> there are also no memory leaks. In the below program commenting out
> `w.Write(rec)` will not cause memory issues.
>
> Example program which causes memory to leak:
> {code:java}
> package main
> import (
> "os"
> "testing"
> "github.com/apache/arrow/go/v9/arrow"
> "github.com/apache/arrow/go/v9/arrow/array"
> "github.com/apache/arrow/go/v9/arrow/memory"
> "github.com/apache/arrow/go/v9/parquet"
> "github.com/apache/arrow/go/v9/parquet/compress"
> "github.com/apache/arrow/go/v9/parquet/pqarrow"
> )
> func main() {
> f, _ := os.Create("/tmp/test.parquet")
> arrowProps := pqarrow.DefaultWriterProps()
> schema := arrow.NewSchema(
> []arrow.Field{
> {Name: "aString", Type: arrow.BinaryTypes.String},
> },
> nil,
> )
> w, _ := pqarrow.NewFileWriter(schema, f,
> parquet.NewWriterProperties(parquet.WithCompression(compress.Codecs.Snappy)),
> arrowProps)
> builder := array.NewRecordBuilder(memory.DefaultAllocator, schema)
> for i := 1; i < 50000000; i++ {
> builder.Field(0).(*array.StringBuilder).Append("HelloWorld!")
> if i%2000000 == 0 {
> // Write row groups out every 2M times
> rec := builder.NewRecord()
> w.Write(rec)
> rec.Release()
> }
> }
> w.Close()
> }{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)