zeroshade opened a new issue, #827:
URL: https://github.com/apache/arrow-go/issues/827

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   **Surfaced while implementing #825 (closes #815).** See the PR's "Out of 
scope" section and [this 
comment](https://github.com/apache/arrow-go/pull/825#issuecomment-4530190386).
   
   `*scalar.Extension` does not implement `Release()` or `Retain()`. As a 
result, when `compute.ScalarDatum.Release()` is called on a `*ScalarDatum` 
whose `Value` is an extension scalar, the storage scalar's `Release()` is never 
invoked and any underlying buffers leak.
   
   #### Mechanism
   
   `compute.ScalarDatum.Release()` only forwards `Release()` if the wrapped 
scalar satisfies the unexported `releasable` interface 
([`arrow/compute/datum.go#L122-L130`](https://github.com/apache/arrow-go/blob/v18.5.2/arrow/compute/datum.go#L122-L130)):
   
   ```go
   type releasable interface {
       Release()
   }
   
   func (d *ScalarDatum) Release() {
       if v, ok := d.Value.(releasable); ok {
           v.Release()
       }
   }
   ```
   
   `*scalar.Extension` 
([`arrow/scalar/scalar.go#L397-L467`](https://github.com/apache/arrow-go/blob/v18.5.2/arrow/scalar/scalar.go#L397-L467))
 defines `value()`, `equals()`, `Validate()`, `ValidateFull()`, `CastTo()`, and 
`String()` — but neither `Release()` nor `Retain()`. The type assertion in 
`ScalarDatum.Release()` therefore fails and the inner storage scalar is never 
released, even though the storage scalar itself (e.g. `*scalar.FixedSizeList`) 
carries a buffer-bearing array.
   
   Peer scalar types implement the pair correctly by delegating to their inner 
storage. For example, `*scalar.Binary` 
([`arrow/scalar/binary.go#L43-L53`](https://github.com/apache/arrow-go/blob/v18.5.2/arrow/scalar/binary.go#L43-L53)):
   
   ```go
   func (b *Binary) Retain() {
       if b.Value != nil {
           b.Value.Retain()
       }
   }
   
   func (b *Binary) Release() {
       if b.Value != nil {
           b.Value.Release()
       }
   }
   ```
   
   `*scalar.List`, `*scalar.Struct`, `*scalar.Dictionary`, 
`*scalar.SparseUnion`, `*scalar.DenseUnion`, and `*scalar.RunEndEncoded` all 
follow the same pattern in `arrow/scalar/nested.go`. `*scalar.Extension` is the 
lone outlier among scalar types whose storage holds buffers.
   
   #### Affected paths
   
   Anything that produces a `*scalar.Extension` and reaches 
`ScalarDatum.Release()`. Concretely in this repo:
   
   - The `*types.IntervalYearToMonthType` and `*types.IntervalDayType` branches 
in `*expr.ProtoLiteral` handling, factored into `intervalYearToMonthDatum` / 
the inline `*types.IntervalDayType` branch in 
[`arrow/compute/exprs/exec.go`](https://github.com/apache/arrow-go/blob/main/arrow/compute/exprs/exec.go)
 (the `intervalYearToMonthDatum` helper is added in #825; the underlying leak 
predates that PR).
   - The new `expr.IntervalYearToMonthLiteral` value/pointer cases added in 
#825.
   - The `intervalYear()` / `intervalDay()` extension type constructors in 
[`arrow/compute/exprs/extension_types.go#L142-L147`](https://github.com/apache/arrow-go/blob/v18.5.2/arrow/compute/exprs/extension_types.go#L142-L147).
   - Any user-defined extension type whose storage scalar holds buffers.
   
   #### Reproducer
   
   Drive any of the affected paths with `memory.NewCheckedAllocator` and call 
`Release()` on the resulting `*ScalarDatum`. `AssertSize(t, 0)` will fail 
because the storage scalar's buffer is never released. (#825's 
`TestLiteralToDatumIntervalYearToMonth` deliberately uses 
`memory.DefaultAllocator` for exactly this reason and explicitly defers the 
leak fix to this issue.)
   
   ### Expected behavior
   
   `*scalar.Extension` should implement `Release()` and `Retain()` by 
delegating to `s.Value` when the inner storage scalar implements `Releasable` 
([`arrow/scalar/scalar.go#L68-L71`](https://github.com/apache/arrow-go/blob/v18.5.2/arrow/scalar/scalar.go#L68-L71)),
 mirroring `*scalar.Binary` and the nested scalars. Suggested implementation:
   
   ```go
   func (s *Extension) Retain() {
       if r, ok := s.Value.(Releasable); ok {
           r.Retain()
       }
   }
   
   func (s *Extension) Release() {
       if r, ok := s.Value.(Releasable); ok {
           r.Release()
       }
   }
   ```
   
   This is purely additive:
   
   - Storage scalars that don't implement `Releasable` (e.g. primitive numeric 
storage) become no-ops, matching today's behavior.
   - Storage scalars that do (e.g. `*scalar.FixedSizeList` for the interval 
extension types) get properly released — closing the leak on the existing 
`*types.IntervalYearToMonthType` / `*types.IntervalDayType` paths and on any 
future extension types whose storage carries buffers.
   - The `Retain()` half restores symmetry: today, a caller that takes 
ownership of an extension scalar has no way to bump the storage refcount.
   
   A follow-up test using `memory.NewCheckedAllocator` with `AssertSize(t, 0)` 
around the interval extension paths in 
`arrow/compute/exprs/exec_internal_test.go` would lock in the fix.
   
   ### Component(s)
   
   Other (`arrow/scalar`, `arrow/compute`)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to