Re: [PR] perf(arrow): Reduce the amount of allocated objects [arrow-go]

via GitHub Wed, 21 Jan 2026 09:01:34 -0800


spiridonov commented on PR #645:
URL: https://github.com/apache/arrow-go/pull/645#issuecomment-3779708779


   Thank you for taking a look @zeroshade!
   
   I have changed a few things:
   * Added a private `newSchema` constructor, so that there is a way to 
initialize `Schema` without cloning fields and meta. This is used internally by 
`WithEndianness()` and `AddField()`. The latter used to effectively allocate 
fields twice (first inside itself and then by calling `NewSchema`).
   * Rolled back my `Fields()` change as a middle ground. So your old behavior 
remains.
   
   Each Loki query can touch hundreds of streams that are spread over thousands 
of data objects, which results in thousands of arrow records being processed. 
Small functions such as `NewSchema` or `Fields` called thousands times turn 
into gigabytes of allocations quickly.
   
   I find it a bit tricky with `iter.Seq[Field]`. By itself it does not 
allocate anything. But calling `yield` on each iteration effectively allocates 
a record on the heap because 1/ this fails escape test (a record is passed to 
another function and can be used after the current closure returns) and 2/ the 
record has pointers inside. Using `iter.Seq[Field]` will not allocate a slice, 
but it will allocate the same amount of records on the heap anyway.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] perf(arrow): Reduce the amount of allocated objects [arrow-go]

Reply via email to