Copilot commented on code in PR #318:
URL: https://github.com/apache/arrow-dotnet/pull/318#discussion_r3108112505
##########
src/Apache.Arrow/Ipc/ArrowMemoryReaderImplementation.cs:
##########
@@ -106,7 +106,7 @@ public override RecordBatch ReadNextRecordBatch()
CreateByteBuffer(_buffer.Slice(_bufferPosition,
messageLength)));
_bufferPosition += messageLength;
- int bodyLength = (int)message.BodyLength;
+ int bodyLength = checked((int)message.BodyLength);
Review Comment:
The PR changes overflow behavior for IPC length fields (now failing fast via
`checked`). Adding a unit test that constructs an IPC message with `BodyLength
> int.MaxValue` (or similarly oversized `RecordBatch.Length`) and asserts the
reader fails with the expected exception would help prevent regressions and
document the supported limits.
##########
src/Apache.Arrow/Ipc/ArrowReaderImplementation.cs:
##########
@@ -375,8 +375,8 @@ private ArrowBuffer BuildArrowBuffer(ByteBuffer bodyData,
Flatbuf.Buffer buffer,
return ArrowBuffer.Empty;
}
- int offset = (int)buffer.Offset;
- int length = (int)buffer.Length;
+ int offset = checked((int)buffer.Offset);
+ int length = checked((int)buffer.Length);
Review Comment:
`checked((int)buffer.Offset)` / `checked((int)buffer.Length)` can overflow
with a generic `OverflowException`. Since these values come from IPC metadata,
consider adding an explicit bounds check and throwing an `InvalidDataException`
with a descriptive message (including `buffer.Offset`/`buffer.Length`) so
callers can distinguish corrupt data from internal bugs.
```suggestion
if (buffer.Offset < 0 || buffer.Offset > int.MaxValue)
{
throw new InvalidDataException(
$"IPC buffer offset is out of range for a .NET buffer:
offset={buffer.Offset}, length={buffer.Length}.");
}
if (buffer.Length < 0 || buffer.Length > int.MaxValue)
{
throw new InvalidDataException(
$"IPC buffer length is out of range for a .NET buffer:
offset={buffer.Offset}, length={buffer.Length}.");
}
int offset = (int)buffer.Offset;
int length = (int)buffer.Length;
```
##########
src/Apache.Arrow/Ipc/ArrowReaderImplementation.cs:
##########
@@ -142,7 +142,7 @@ protected RecordBatch CreateArrowObjectFromMessage(
case Flatbuf.MessageHeader.RecordBatch:
Flatbuf.RecordBatch rb =
message.Header<Flatbuf.RecordBatch>().Value;
List<IArrowArray> arrays = BuildArrays(message.Version,
Schema, bodyByteBuffer, rb);
- return new RecordBatch(Schema, memoryOwner, arrays,
(int)rb.Length);
+ return new RecordBatch(Schema, memoryOwner, arrays,
checked((int)rb.Length));
Review Comment:
`checked((int)rb.Length)` will throw an `OverflowException` without context
if the IPC record batch length exceeds `int.MaxValue`. Consider doing an
explicit range check and throwing an `InvalidDataException`/`OverflowException`
with a message that includes the offending `rb.Length` to make
corrupt/unsupported IPC payloads easier to diagnose.
```suggestion
if (rb.Length > int.MaxValue)
{
throw new InvalidDataException(
$"IPC record batch length {rb.Length} exceeds
the maximum supported length of {int.MaxValue}.");
}
return new RecordBatch(Schema, memoryOwner, arrays,
(int)rb.Length);
```
##########
src/Apache.Arrow/Ipc/ArrowMemoryReaderImplementation.cs:
##########
@@ -106,7 +106,7 @@ public override RecordBatch ReadNextRecordBatch()
CreateByteBuffer(_buffer.Slice(_bufferPosition,
messageLength)));
_bufferPosition += messageLength;
- int bodyLength = (int)message.BodyLength;
+ int bodyLength = checked((int)message.BodyLength);
ByteBuffer bodybb =
CreateByteBuffer(_buffer.Slice(_bufferPosition, bodyLength));
_bufferPosition += bodyLength;
Review Comment:
After converting `message.BodyLength` to `int`,
`_buffer.Slice(_bufferPosition, bodyLength)` will throw
`ArgumentOutOfRangeException` for truncated/corrupt buffers. Since this is
untrusted IPC data, consider validating `message.BodyLength` is within `[0,
int.MaxValue]` and that `_bufferPosition + bodyLength <= _buffer.Length`, and
throw `InvalidDataException` with a clear message when it isn’t.
```suggestion
long bodyLength = message.BodyLength;
if (bodyLength < 0 || bodyLength > int.MaxValue)
{
throw new InvalidDataException($"Corrupted IPC message.
Invalid body length: {bodyLength}.");
}
int bodyLengthInt = (int)bodyLength;
if (bodyLengthInt > _buffer.Length - _bufferPosition)
{
throw new InvalidDataException(
$"Corrupted IPC message. Body length {bodyLengthInt}
exceeds remaining buffer length {_buffer.Length - _bufferPosition}.");
}
ByteBuffer bodybb =
CreateByteBuffer(_buffer.Slice(_bufferPosition, bodyLengthInt));
_bufferPosition += bodyLengthInt;
```
##########
src/Apache.Arrow/Ipc/ArrowReaderImplementation.cs:
##########
@@ -257,8 +257,8 @@ private ArrayData LoadField(
IBufferCreator bufferCreator)
{
- int fieldLength = (int)fieldNode.Length;
- int fieldNullCount = (int)fieldNode.NullCount;
+ int fieldLength = checked((int)fieldNode.Length);
+ int fieldNullCount = checked((int)fieldNode.NullCount);
Review Comment:
These `checked` casts can raise an `OverflowException` with no indication of
which field node was invalid. Consider validating
`fieldNode.Length`/`fieldNode.NullCount` are within `[0, int.MaxValue]` and
throwing `InvalidDataException` with the actual values (and ideally the field
name) to improve debuggability for corrupt IPC streams.
##########
src/Apache.Arrow.Flight/Internal/RecordBatchReaderImplementation.cs:
##########
@@ -135,7 +135,7 @@ public override async ValueTask<RecordBatch>
ReadNextRecordBatchAsync(Cancellati
{
case MessageHeader.RecordBatch:
var body = _flightDataStream.Current.DataBody.Memory;
- return CreateArrowObjectFromMessage(message,
CreateByteBuffer(body.Slice(0, (int)message.BodyLength)), null);
+ return CreateArrowObjectFromMessage(message,
CreateByteBuffer(body.Slice(0, checked((int)message.BodyLength))), null);
Review Comment:
`checked((int)message.BodyLength)` will throw an `OverflowException` without
context, and `body.Slice(0, ...)` will throw `ArgumentOutOfRangeException` if
the Flight message advertises a body length larger than the available buffer.
Consider explicitly validating `message.BodyLength` against `[0, body.Length]`
(and `int.MaxValue`) and throwing a more descriptive exception when it is
invalid.
```suggestion
long bodyLength = message.BodyLength;
if (bodyLength < 0 || bodyLength > int.MaxValue ||
bodyLength > body.Length)
{
throw new InvalidOperationException(
$"Invalid Flight message body length
{bodyLength}. Expected a value between 0 and {Math.Min(body.Length,
int.MaxValue)}.");
}
int validatedBodyLength = (int)bodyLength;
return CreateArrowObjectFromMessage(message,
CreateByteBuffer(body.Slice(0, validatedBodyLength)), null);
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]