Copilot commented on code in PR #291:
URL: https://github.com/apache/arrow-dotnet/pull/291#discussion_r2971791573
##########
src/Apache.Arrow/C/CArrowArrayExporter.cs:
##########
@@ -142,11 +143,7 @@ private unsafe static void
ConvertArray(ExportedAllocationOwner sharedOwner, Arr
for (int i = 0; i < array.Buffers.Length; i++)
{
ArrowBuffer buffer = array.Buffers[i];
- IntPtr ptr;
- if (!buffer.TryExport(sharedOwner, out ptr))
- {
- throw new NotSupportedException($"An ArrowArray of
type {array.DataType.TypeId} could not be exported: failed on buffer #{i}");
- }
+ IntPtr ptr = buffer.Export(sharedOwner);
Review Comment:
`ConvertArray` previously emitted a clear `NotSupportedException` including
the buffer index when a buffer could not be exported. With the switch to
`ArrowBuffer.Export()`, any failure in `Memory.Pin()` will bubble up without
context about which buffer caused it. Consider catching pin/export failures
here and rethrowing a `NotSupportedException` (or wrapping exception) that
includes the buffer index and array type to keep diagnostics actionable.
```suggestion
IntPtr ptr;
try
{
ptr = buffer.Export(sharedOwner);
}
catch (Exception ex)
{
throw new NotSupportedException(
$"Failed to export buffer at index {i} for array
type '{array.DataType}'.",
ex);
}
```
##########
src/Apache.Arrow/ChunkedArray.cs:
##########
@@ -92,6 +92,43 @@ public ChunkedArray Slice(long offset)
return Slice(offset, Length - offset);
}
+ /// <summary>
+ /// Slice this chunked array with shared ownership. The returned slice
keeps the
+ /// underlying buffers alive via reference counting. The caller must
dispose
+ /// the returned chunked array when done.
+ /// </summary>
+ public ChunkedArray SliceShared(long offset, long length)
+ {
+ if (offset >= Length)
+ {
+ throw new ArgumentException($"Index {offset} cannot be greater
than the Column's Length {Length}");
Review Comment:
`ChunkedArray.SliceShared` throws an `ArgumentException` with a message that
refers to a "Column's Length", which is misleading in this context (this is a
`ChunkedArray`). Consider updating the message to reference
`ChunkedArray.Length` (and, if possible, aligning the wording with other
Slice/SliceShared APIs).
```suggestion
throw new ArgumentException($"Offset {offset} cannot be
greater than ChunkedArray.Length {Length}");
```
##########
src/Apache.Arrow/ChunkedArray.cs:
##########
@@ -92,6 +92,43 @@ public ChunkedArray Slice(long offset)
return Slice(offset, Length - offset);
}
+ /// <summary>
+ /// Slice this chunked array with shared ownership. The returned slice
keeps the
+ /// underlying buffers alive via reference counting. The caller must
dispose
+ /// the returned chunked array when done.
+ /// </summary>
+ public ChunkedArray SliceShared(long offset, long length)
+ {
+ if (offset >= Length)
+ {
+ throw new ArgumentException($"Index {offset} cannot be greater
than the Column's Length {Length}");
+ }
+
+ int curArrayIndex = 0;
+ int numArrays = Arrays.Count;
+ while (curArrayIndex < numArrays && offset >
Arrays[curArrayIndex].Length)
+ {
+ offset -= Arrays[curArrayIndex].Length;
+ curArrayIndex++;
+ }
Review Comment:
The loop that advances `curArrayIndex` uses `offset >
Arrays[curArrayIndex].Length`. If `offset` lands exactly on a chunk boundary
(e.g., `offset == Arrays[curArrayIndex].Length`), the method will add an empty
slice for that chunk before moving on, producing a `ChunkedArray` with a
0-length chunk. Consider changing the boundary condition so exact-end offsets
advance to the next chunk before slicing (and apply the same fix to `Slice` to
keep behaviors consistent).
##########
src/Apache.Arrow/Column.cs:
##########
@@ -62,6 +62,21 @@ public Column Slice(int offset)
return new Column(Field, Data.Slice(offset));
}
+ /// <summary>
+ /// Slice this column with shared ownership. The returned slice keeps
the
+ /// underlying buffers alive via reference counting. The caller must
dispose
+ /// the returned column when done.
Review Comment:
The XML doc for `Column.SliceShared` says the caller must dispose the
returned column, but `Column` does not implement `IDisposable` (and there is no
`Dispose()` method). The doc should instead explain how to release underlying
resources (e.g., disposing the underlying arrays in the returned column’s
`ChunkedArray`).
```suggestion
/// underlying buffers alive via reference counting.
/// <para>
/// The caller is responsible for releasing the underlying resources
when
/// finished with the returned column, for example by disposing each
/// <see cref="IArrowArray"/> in the returned column's
/// <see cref="ChunkedArray"/> (<see cref="Data"/>).
/// </para>
```
##########
src/Apache.Arrow/Arrays/ArrayData.cs:
##########
@@ -97,8 +101,57 @@ public ArrayData(
Dictionary = dictionary;
}
+ private ArrayData(
+ ArrayData parent,
+ IArrowType dataType,
+ int length, int nullCount, int offset,
+ ArrowBuffer[] buffers, ArrayData[] children, ArrayData dictionary)
+ {
+ _parent = parent;
+ DataType = dataType ?? NullType.Default;
+ Length = length;
+ NullCount = nullCount;
+ Offset = offset;
+ Buffers = buffers;
+ Children = children;
+ Dictionary = dictionary;
+ }
+
+ /// <summary>
+ /// Increment the reference count, allowing this ArrayData to be shared
+ /// across multiple owners. Each call to Acquire must be balanced by a
+ /// call to <see cref="Dispose"/>.
+ /// </summary>
+ /// <returns>This instance.</returns>
+ public ArrayData Acquire()
+ {
+ if (Interlocked.Increment(ref _referenceCount) <= 1)
+ {
+ Interlocked.Decrement(ref _referenceCount);
+ throw new ObjectDisposedException(nameof(ArrayData));
+ }
+ return this;
+ }
+
public void Dispose()
{
+ int remaining = Interlocked.Decrement(ref _referenceCount);
+ if (remaining > 0)
+ {
+ return;
+ }
+ if (remaining < 0)
+ {
+ throw new ObjectDisposedException(nameof(ArrayData),
"ArrayData has already been disposed.");
Review Comment:
`ArrayData.Dispose()` now throws `ObjectDisposedException` when called more
times than `Acquire()`. Since `Array.Dispose()` simply forwards to
`Data.Dispose()` without an idempotency guard, this change makes double-dispose
of any `Array` (and composite types like `RecordBatch` that dispose their child
arrays) throw at runtime. Consider making `ArrayData.Dispose()` idempotent
(returning when the refcount is already 0) or adding a disposed guard at
higher-level public `Dispose()` entry points so repeated disposal does not
throw.
```suggestion
// There are still active references; do not dispose yet.
return;
}
if (remaining < 0)
{
// Already fully disposed. Make Dispose idempotent by
restoring
// the reference count to zero and returning without
throwing.
Interlocked.Increment(ref _referenceCount);
return;
```
##########
src/Apache.Arrow/ChunkedArray.cs:
##########
@@ -92,6 +92,43 @@ public ChunkedArray Slice(long offset)
return Slice(offset, Length - offset);
}
+ /// <summary>
+ /// Slice this chunked array with shared ownership. The returned slice
keeps the
+ /// underlying buffers alive via reference counting. The caller must
dispose
+ /// the returned chunked array when done.
Review Comment:
The XML doc for `ChunkedArray.SliceShared` says the caller must dispose the
returned chunked array, but `ChunkedArray` does not implement `IDisposable`.
Please update the documentation to describe how callers should release
resources (typically by disposing the `IArrowArray` chunks contained in the
returned `ChunkedArray`).
```suggestion
/// underlying buffers alive via reference counting. Callers are
responsible for
/// disposing the <see cref="IArrowArray"/> chunks contained in the
returned
/// <see cref="ChunkedArray"/> when they are no longer needed.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]