This is an automated email from the ASF dual-hosted git repository.
curth pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/main by this push:
new 299ad70869 GH-43266: [C#] Add LargeBinary, LargeString and LargeList
array types (#43269)
299ad70869 is described below
commit 299ad7086928c555f3e861c20327276d1c4f2557
Author: Adam Reeve <[email protected]>
AuthorDate: Fri Jul 19 15:52:18 2024 +1200
GH-43266: [C#] Add LargeBinary, LargeString and LargeList array types
(#43269)
### Rationale for this change
See #43266. Note that LargeBinary and LargeString are still limited to 2
GiB buffers, and LargeList is limited to offsets that can be represented as
int32.
### What changes are included in this PR?
* Add new Array subtypes: LargeBinaryArray, LargeStringArray and
LargeListArray
* Support round-tripping these array types via the IPC format
* Support round-tripping these array types via the C Data Interface
* Improve error messages when importing arrays that are too large via IPC
or C Data Interface
* Enable integration tests for the new types
* Update documentation
### Are these changes tested?
Yes, I've added some basic tests specifically for the new array types, and
added these to the test data generator so they're covered by the existing tests
for round tripping using IPC and C Data Interface.
### Are there any user-facing changes?
Yes, this is a new user facing feature.
### Implementation notes
* I haven't added builders for these new array types. Given they're added
to help with interoperability with other libraries, I wouldn't expect .NET
users to build arrays of these types as they provide no other benefit over the
non-large types until we have proper large memory support. But I'm happy to add
this if it would be useful.
* The new array types share a lot of logic with the non-large types. I
considered trying to consolidate this logic by adding a new
`BinaryArrayBase<TOffset>` class for example, but I think this would require
generic math support to work nicely, and would still complicate the code quite
a bit and add extra virtual method call overhead. So I think it's fine to keep
these new Array subtypes independent from the non-large types.
* I haven't included support for materializing a LargeStringArray (see
#41048). I'm not sure whether there would be a use for this, but it could be
added later if needed.
* GitHub Issue: #43266
Authored-by: Adam Reeve <[email protected]>
Signed-off-by: Curt Hagenlocher <[email protected]>
---
csharp/README.md | 3 +-
.../src/Apache.Arrow/Arrays/ArrowArrayFactory.cs | 6 +
csharp/src/Apache.Arrow/Arrays/LargeBinaryArray.cs | 154 +++++++++++++++++++++
csharp/src/Apache.Arrow/Arrays/LargeListArray.cs | 97 +++++++++++++
csharp/src/Apache.Arrow/Arrays/LargeStringArray.cs | 113 +++++++++++++++
csharp/src/Apache.Arrow/C/CArrowArrayImporter.cs | 69 +++++++++
csharp/src/Apache.Arrow/C/CArrowSchemaExporter.cs | 5 +-
csharp/src/Apache.Arrow/C/CArrowSchemaImporter.cs | 14 +-
.../Apache.Arrow/Ipc/ArrowReaderImplementation.cs | 2 +
.../Ipc/ArrowStreamReaderImplementation.cs | 8 +-
csharp/src/Apache.Arrow/Ipc/ArrowStreamWriter.cs | 76 ++++++++++
.../Apache.Arrow/Ipc/ArrowTypeFlatbufferBuilder.cs | 29 +++-
csharp/src/Apache.Arrow/Ipc/MessageSerializer.cs | 10 ++
csharp/src/Apache.Arrow/Types/IArrowType.cs | 3 +
csharp/src/Apache.Arrow/Types/LargeBinaryType.cs | 27 ++++
.../Types/{IArrowType.cs => LargeListType.cs} | 59 ++------
csharp/src/Apache.Arrow/Types/LargeStringType.cs | 27 ++++
.../test/Apache.Arrow.IntegrationTest/JsonFile.cs | 88 ++++++++++++
.../test/Apache.Arrow.Tests/ArrowReaderVerifier.cs | 78 +++++++++++
.../Apache.Arrow.Tests/LargeBinaryArrayTests.cs | 95 +++++++++++++
.../test/Apache.Arrow.Tests/LargeListArrayTests.cs | 105 ++++++++++++++
.../Apache.Arrow.Tests/LargeStringArrayTests.cs | 91 ++++++++++++
csharp/test/Apache.Arrow.Tests/TableTests.cs | 4 +-
csharp/test/Apache.Arrow.Tests/TestData.cs | 118 ++++++++++++++++
dev/archery/archery/integration/datagen.py | 4 +-
docs/source/status.rst | 8 +-
26 files changed, 1231 insertions(+), 62 deletions(-)
diff --git a/csharp/README.md b/csharp/README.md
index b36eb899db..663aaf8ab2 100644
--- a/csharp/README.md
+++ b/csharp/README.md
@@ -129,7 +129,8 @@ for currently available features.
- Types
- Tensor
- Arrays
- - Large Arrays
+ - Large Arrays. There are large array types provided to help with
interoperability with other libraries,
+ but these do not support buffers larger than 2 GiB and an exception will
be raised if trying to import an array that is too large.
- Large Binary
- Large List
- Large String
diff --git a/csharp/src/Apache.Arrow/Arrays/ArrowArrayFactory.cs
b/csharp/src/Apache.Arrow/Arrays/ArrowArrayFactory.cs
index 67c4b21a2e..bd06c3a1b8 100644
--- a/csharp/src/Apache.Arrow/Arrays/ArrowArrayFactory.cs
+++ b/csharp/src/Apache.Arrow/Arrays/ArrowArrayFactory.cs
@@ -53,18 +53,24 @@ namespace Apache.Arrow
return new StringArray(data);
case ArrowTypeId.StringView:
return new StringViewArray(data);
+ case ArrowTypeId.LargeString:
+ return new LargeStringArray(data);
case ArrowTypeId.FixedSizedBinary:
return new FixedSizeBinaryArray(data);
case ArrowTypeId.Binary:
return new BinaryArray(data);
case ArrowTypeId.BinaryView:
return new BinaryViewArray(data);
+ case ArrowTypeId.LargeBinary:
+ return new LargeBinaryArray(data);
case ArrowTypeId.Timestamp:
return new TimestampArray(data);
case ArrowTypeId.List:
return new ListArray(data);
case ArrowTypeId.ListView:
return new ListViewArray(data);
+ case ArrowTypeId.LargeList:
+ return new LargeListArray(data);
case ArrowTypeId.Map:
return new MapArray(data);
case ArrowTypeId.Struct:
diff --git a/csharp/src/Apache.Arrow/Arrays/LargeBinaryArray.cs
b/csharp/src/Apache.Arrow/Arrays/LargeBinaryArray.cs
new file mode 100644
index 0000000000..9eddbedab5
--- /dev/null
+++ b/csharp/src/Apache.Arrow/Arrays/LargeBinaryArray.cs
@@ -0,0 +1,154 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements. See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+using Apache.Arrow.Types;
+using System;
+using System.Collections;
+using System.Collections.Generic;
+using System.Runtime.CompilerServices;
+
+namespace Apache.Arrow;
+
+public class LargeBinaryArray : Array, IReadOnlyList<byte[]>,
ICollection<byte[]>
+{
+ public LargeBinaryArray(ArrayData data)
+ : base(data)
+ {
+ data.EnsureDataType(ArrowTypeId.LargeBinary);
+ data.EnsureBufferCount(3);
+ }
+
+ public LargeBinaryArray(ArrowTypeId typeId, ArrayData data)
+ : base(data)
+ {
+ data.EnsureDataType(typeId);
+ data.EnsureBufferCount(3);
+ }
+
+ public LargeBinaryArray(IArrowType dataType, int length,
+ ArrowBuffer valueOffsetsBuffer,
+ ArrowBuffer dataBuffer,
+ ArrowBuffer nullBitmapBuffer,
+ int nullCount = 0, int offset = 0)
+ : this(new ArrayData(dataType, length, nullCount, offset,
+ new[] { nullBitmapBuffer, valueOffsetsBuffer, dataBuffer }))
+ { }
+
+ public override void Accept(IArrowArrayVisitor visitor) => Accept(this,
visitor);
+
+ public ArrowBuffer ValueOffsetsBuffer => Data.Buffers[1];
+
+ public ArrowBuffer ValueBuffer => Data.Buffers[2];
+
+ public ReadOnlySpan<long> ValueOffsets =>
ValueOffsetsBuffer.Span.CastTo<long>().Slice(Offset, Length + 1);
+
+ [MethodImpl(MethodImplOptions.AggressiveInlining)]
+ public int GetValueLength(int index)
+ {
+ if (index < 0 || index >= Length)
+ {
+ throw new ArgumentOutOfRangeException(nameof(index));
+ }
+ if (!IsValid(index))
+ {
+ return 0;
+ }
+
+ ReadOnlySpan<long> offsets = ValueOffsets;
+ return checked((int)(offsets[index + 1] - offsets[index]));
+ }
+
+ /// <summary>
+ /// Get the collection of bytes, as a read-only span, at a given index in
the array.
+ /// </summary>
+ /// <remarks>
+ /// Note that this method cannot reliably identify null values, which are
indistinguishable from empty byte
+ /// collection values when seen in the context of this method's return
type of <see cref="ReadOnlySpan{Byte}"/>.
+ /// Use the <see cref="Array.IsNull"/> method or the <see
cref="GetBytes(int, out bool)"/> overload instead
+ /// to reliably determine null values.
+ /// </remarks>
+ /// <param name="index">Index at which to get bytes.</param>
+ /// <returns>Returns a <see cref="ReadOnlySpan{Byte}"/> object.</returns>
+ /// <exception cref="ArgumentOutOfRangeException">If the index is negative
or beyond the length of the array.
+ /// </exception>
+ public ReadOnlySpan<byte> GetBytes(int index) => GetBytes(index, out _);
+
+ /// <summary>
+ /// Get the collection of bytes, as a read-only span, at a given index in
the array.
+ /// </summary>
+ /// <param name="index">Index at which to get bytes.</param>
+ /// <param name="isNull">Set to <see langword="true"/> if the value at the
given index is null.</param>
+ /// <returns>Returns a <see cref="ReadOnlySpan{Byte}"/> object.</returns>
+ /// <exception cref="ArgumentOutOfRangeException">If the index is negative
or beyond the length of the array.
+ /// </exception>
+ public ReadOnlySpan<byte> GetBytes(int index, out bool isNull)
+ {
+ if (index < 0 || index >= Length)
+ {
+ throw new ArgumentOutOfRangeException(nameof(index));
+ }
+
+ isNull = IsNull(index);
+
+ if (isNull)
+ {
+ // Note that `return null;` is valid syntax, but would be
misleading as `null` in the context of a span
+ // is actually returned as an empty span.
+ return ReadOnlySpan<byte>.Empty;
+ }
+
+ var offset = checked((int)ValueOffsets[index]);
+ return ValueBuffer.Span.Slice(offset, GetValueLength(index));
+ }
+
+ int IReadOnlyCollection<byte[]>.Count => Length;
+
+ byte[] IReadOnlyList<byte[]>.this[int index] => GetBytes(index).ToArray();
+
+ IEnumerator<byte[]> IEnumerable<byte[]>.GetEnumerator()
+ {
+ for (int index = 0; index < Length; index++)
+ {
+ yield return GetBytes(index).ToArray();
+ }
+ }
+
+ IEnumerator IEnumerable.GetEnumerator() =>
((IEnumerable<byte[]>)this).GetEnumerator();
+
+ int ICollection<byte[]>.Count => Length;
+ bool ICollection<byte[]>.IsReadOnly => true;
+ void ICollection<byte[]>.Add(byte[] item) => throw new
NotSupportedException("Collection is read-only.");
+ bool ICollection<byte[]>.Remove(byte[] item) => throw new
NotSupportedException("Collection is read-only.");
+ void ICollection<byte[]>.Clear() => throw new
NotSupportedException("Collection is read-only.");
+
+ bool ICollection<byte[]>.Contains(byte[] item)
+ {
+ for (int index = 0; index < Length; index++)
+ {
+ if (GetBytes(index).SequenceEqual(item))
+ return true;
+ }
+
+ return false;
+ }
+
+ void ICollection<byte[]>.CopyTo(byte[][] array, int arrayIndex)
+ {
+ for (int srcIndex = 0, destIndex = arrayIndex; srcIndex < Length;
srcIndex++, destIndex++)
+ {
+ array[destIndex] = GetBytes(srcIndex).ToArray();
+ }
+ }
+}
diff --git a/csharp/src/Apache.Arrow/Arrays/LargeListArray.cs
b/csharp/src/Apache.Arrow/Arrays/LargeListArray.cs
new file mode 100644
index 0000000000..6e37aa4c63
--- /dev/null
+++ b/csharp/src/Apache.Arrow/Arrays/LargeListArray.cs
@@ -0,0 +1,97 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements. See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+using System;
+using Apache.Arrow.Types;
+
+namespace Apache.Arrow
+{
+ public class LargeListArray : Array
+ {
+ public IArrowArray Values { get; }
+
+ public ArrowBuffer ValueOffsetsBuffer => Data.Buffers[1];
+
+ public ReadOnlySpan<long> ValueOffsets =>
ValueOffsetsBuffer.Span.CastTo<long>().Slice(Offset, Length + 1);
+
+ public LargeListArray(IArrowType dataType, int length,
+ ArrowBuffer valueOffsetsBuffer, IArrowArray values,
+ ArrowBuffer nullBitmapBuffer, int nullCount = 0, int offset = 0)
+ : this(new ArrayData(dataType, length, nullCount, offset,
+ new[] { nullBitmapBuffer, valueOffsetsBuffer }, new[] {
values.Data }),
+ values)
+ {
+ }
+
+ public LargeListArray(ArrayData data)
+ : this(data, ArrowArrayFactory.BuildArray(data.Children[0]))
+ {
+ }
+
+ private LargeListArray(ArrayData data, IArrowArray values) : base(data)
+ {
+ data.EnsureBufferCount(2);
+ data.EnsureDataType(ArrowTypeId.LargeList);
+ Values = values;
+ }
+
+ public override void Accept(IArrowArrayVisitor visitor) =>
Accept(this, visitor);
+
+ public int GetValueLength(int index)
+ {
+ if (index < 0 || index >= Length)
+ {
+ throw new ArgumentOutOfRangeException(nameof(index));
+ }
+
+ if (IsNull(index))
+ {
+ return 0;
+ }
+
+ ReadOnlySpan<long> offsets = ValueOffsets;
+ return checked((int)(offsets[index + 1] - offsets[index]));
+ }
+
+ public IArrowArray GetSlicedValues(int index)
+ {
+ if (index < 0 || index >= Length)
+ {
+ throw new ArgumentOutOfRangeException(nameof(index));
+ }
+
+ if (IsNull(index))
+ {
+ return null;
+ }
+
+ if (!(Values is Array array))
+ {
+ return default;
+ }
+
+ return array.Slice(checked((int)ValueOffsets[index]),
GetValueLength(index));
+ }
+
+ protected override void Dispose(bool disposing)
+ {
+ if (disposing)
+ {
+ Values?.Dispose();
+ }
+ base.Dispose(disposing);
+ }
+ }
+}
diff --git a/csharp/src/Apache.Arrow/Arrays/LargeStringArray.cs
b/csharp/src/Apache.Arrow/Arrays/LargeStringArray.cs
new file mode 100644
index 0000000000..2a65b828ac
--- /dev/null
+++ b/csharp/src/Apache.Arrow/Arrays/LargeStringArray.cs
@@ -0,0 +1,113 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements. See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+using System;
+using System.Collections;
+using System.Collections.Generic;
+using System.Runtime.InteropServices;
+using System.Text;
+using Apache.Arrow.Types;
+
+namespace Apache.Arrow;
+
+public class LargeStringArray: LargeBinaryArray, IReadOnlyList<string>,
ICollection<string>
+{
+ public static readonly Encoding DefaultEncoding =
StringArray.DefaultEncoding;
+
+ public LargeStringArray(ArrayData data)
+ : base(ArrowTypeId.LargeString, data) { }
+
+ public LargeStringArray(int length,
+ ArrowBuffer valueOffsetsBuffer,
+ ArrowBuffer dataBuffer,
+ ArrowBuffer nullBitmapBuffer,
+ int nullCount = 0, int offset = 0)
+ : this(new ArrayData(LargeStringType.Default, length, nullCount,
offset,
+ new[] { nullBitmapBuffer, valueOffsetsBuffer, dataBuffer }))
+ { }
+
+ public override void Accept(IArrowArrayVisitor visitor) => Accept(this,
visitor);
+
+ /// <summary>
+ /// Get the string value at the given index
+ /// </summary>
+ /// <param name="index">Input index</param>
+ /// <param name="encoding">Optional: the string encoding, default is
UTF8</param>
+ /// <returns>The string object at the given index</returns>
+ public string GetString(int index, Encoding encoding = default)
+ {
+ encoding ??= DefaultEncoding;
+
+ ReadOnlySpan<byte> bytes = GetBytes(index, out bool isNull);
+
+ if (isNull)
+ {
+ return null;
+ }
+
+ if (bytes.Length == 0)
+ {
+ return string.Empty;
+ }
+
+ unsafe
+ {
+ fixed (byte* data = &MemoryMarshal.GetReference(bytes))
+ {
+ return encoding.GetString(data, bytes.Length);
+ }
+ }
+ }
+
+
+ int IReadOnlyCollection<string>.Count => Length;
+
+ string IReadOnlyList<string>.this[int index] => GetString(index);
+
+ IEnumerator<string> IEnumerable<string>.GetEnumerator()
+ {
+ for (int index = 0; index < Length; index++)
+ {
+ yield return GetString(index);
+ };
+ }
+
+ IEnumerator IEnumerable.GetEnumerator() =>
((IEnumerable<string>)this).GetEnumerator();
+
+ int ICollection<string>.Count => Length;
+ bool ICollection<string>.IsReadOnly => true;
+ void ICollection<string>.Add(string item) => throw new
NotSupportedException("Collection is read-only.");
+ bool ICollection<string>.Remove(string item) => throw new
NotSupportedException("Collection is read-only.");
+ void ICollection<string>.Clear() => throw new
NotSupportedException("Collection is read-only.");
+
+ bool ICollection<string>.Contains(string item)
+ {
+ for (int index = 0; index < Length; index++)
+ {
+ if (GetString(index) == item)
+ return true;
+ }
+
+ return false;
+ }
+
+ void ICollection<string>.CopyTo(string[] array, int arrayIndex)
+ {
+ for (int srcIndex = 0, destIndex = arrayIndex; srcIndex < Length;
srcIndex++, destIndex++)
+ {
+ array[destIndex] = GetString(srcIndex);
+ }
+ }
+}
diff --git a/csharp/src/Apache.Arrow/C/CArrowArrayImporter.cs
b/csharp/src/Apache.Arrow/C/CArrowArrayImporter.cs
index abe02dcbb5..68b67f3d7c 100644
--- a/csharp/src/Apache.Arrow/C/CArrowArrayImporter.cs
+++ b/csharp/src/Apache.Arrow/C/CArrowArrayImporter.cs
@@ -162,6 +162,10 @@ namespace Apache.Arrow.C
case ArrowTypeId.BinaryView:
buffers = ImportByteArrayViewBuffers(cArray);
break;
+ case ArrowTypeId.LargeString:
+ case ArrowTypeId.LargeBinary:
+ buffers = ImportLargeByteArrayBuffers(cArray);
+ break;
case ArrowTypeId.List:
children = ProcessListChildren(cArray,
((ListType)type).ValueDataType);
buffers = ImportListBuffers(cArray);
@@ -170,6 +174,10 @@ namespace Apache.Arrow.C
children = ProcessListChildren(cArray,
((ListViewType)type).ValueDataType);
buffers = ImportListViewBuffers(cArray);
break;
+ case ArrowTypeId.LargeList:
+ children = ProcessListChildren(cArray,
((LargeListType)type).ValueDataType);
+ buffers = ImportLargeListBuffers(cArray);
+ break;
case ArrowTypeId.FixedSizeList:
children = ProcessListChildren(cArray,
((FixedSizeListType)type).ValueDataType);
buffers = ImportFixedSizeListBuffers(cArray);
@@ -313,6 +321,42 @@ namespace Apache.Arrow.C
return buffers;
}
+ private ArrowBuffer[] ImportLargeByteArrayBuffers(CArrowArray*
cArray)
+ {
+ if (cArray->n_buffers != 3)
+ {
+ throw new InvalidOperationException("Large byte arrays are
expected to have exactly three buffers");
+ }
+
+ const int maxLength = int.MaxValue / 8 - 1;
+ if (cArray->length > maxLength)
+ {
+ throw new OverflowException(
+ $"Cannot import large byte array. Array length
{cArray->length} " +
+ $"is greater than the maximum supported large byte
array length ({maxLength})");
+ }
+
+ int length = (int)cArray->length;
+ int offsetsLength = (length + 1) * 8;
+ long* offsets = (long*)cArray->buffers[1];
+ Debug.Assert(offsets != null);
+ long valuesLength = offsets[length];
+
+ if (valuesLength > int.MaxValue)
+ {
+ throw new OverflowException(
+ $"Cannot import large byte array. Data length
{valuesLength} " +
+ $"is greater than the maximum supported large byte
array data length ({int.MaxValue})");
+ }
+
+ ArrowBuffer[] buffers = new ArrowBuffer[3];
+ buffers[0] = ImportValidityBuffer(cArray);
+ buffers[1] = ImportCArrayBuffer(cArray, 1, offsetsLength);
+ buffers[2] = ImportCArrayBuffer(cArray, 2, (int)valuesLength);
+
+ return buffers;
+ }
+
private ArrowBuffer[] ImportListBuffers(CArrowArray* cArray)
{
if (cArray->n_buffers != 2)
@@ -348,6 +392,31 @@ namespace Apache.Arrow.C
return buffers;
}
+ private ArrowBuffer[] ImportLargeListBuffers(CArrowArray* cArray)
+ {
+ if (cArray->n_buffers != 2)
+ {
+ throw new InvalidOperationException("Large list arrays are
expected to have exactly two buffers");
+ }
+
+ const int maxLength = int.MaxValue / 8 - 1;
+ if (cArray->length > maxLength)
+ {
+ throw new OverflowException(
+ $"Cannot import large list array. Array length
{cArray->length} " +
+ $"is greater than the maximum supported large list
array length ({maxLength})");
+ }
+
+ int length = (int)cArray->length;
+ int offsetsLength = (length + 1) * 8;
+
+ ArrowBuffer[] buffers = new ArrowBuffer[2];
+ buffers[0] = ImportValidityBuffer(cArray);
+ buffers[1] = ImportCArrayBuffer(cArray, 1, offsetsLength);
+
+ return buffers;
+ }
+
private ArrowBuffer[] ImportFixedSizeListBuffers(CArrowArray*
cArray)
{
if (cArray->n_buffers != 1)
diff --git a/csharp/src/Apache.Arrow/C/CArrowSchemaExporter.cs
b/csharp/src/Apache.Arrow/C/CArrowSchemaExporter.cs
index 3bb7134af3..92d48a2d70 100644
--- a/csharp/src/Apache.Arrow/C/CArrowSchemaExporter.cs
+++ b/csharp/src/Apache.Arrow/C/CArrowSchemaExporter.cs
@@ -168,8 +168,10 @@ namespace Apache.Arrow.C
// Binary
case BinaryType _: return "z";
case BinaryViewType _: return "vz";
+ case LargeBinaryType _: return "Z";
case StringType _: return "u";
case StringViewType _: return "vu";
+ case LargeStringType _: return "U";
case FixedSizeBinaryType binaryType:
return $"w:{binaryType.ByteWidth}";
// Date
@@ -199,6 +201,7 @@ namespace Apache.Arrow.C
// Nested
case ListType _: return "+l";
case ListViewType _: return "+vl";
+ case LargeListType _: return "+L";
case FixedSizeListType fixedListType:
return $"+w:{fixedListType.ListSize}";
case StructType _: return "+s";
@@ -208,7 +211,7 @@ namespace Apache.Arrow.C
case DictionaryType dictionaryType:
return GetFormat(dictionaryType.IndexType);
default: throw new NotImplementedException($"Exporting
{datatype.Name} not implemented");
- };
+ }
}
private static long GetFlags(IArrowType datatype, bool nullable = true)
diff --git a/csharp/src/Apache.Arrow/C/CArrowSchemaImporter.cs
b/csharp/src/Apache.Arrow/C/CArrowSchemaImporter.cs
index f1acc007bc..94177184de 100644
--- a/csharp/src/Apache.Arrow/C/CArrowSchemaImporter.cs
+++ b/csharp/src/Apache.Arrow/C/CArrowSchemaImporter.cs
@@ -165,7 +165,7 @@ namespace Apache.Arrow.C
}
// Special handling for nested types
- if (format == "+l" || format == "+vl")
+ if (format == "+l" || format == "+vl" || format == "+L")
{
if (_cSchema->n_children != 1)
{
@@ -180,7 +180,13 @@ namespace Apache.Arrow.C
Field childField = childSchema.GetAsField();
- return format[1] == 'v' ? new ListViewType(childField) :
new ListType(childField);
+ return format[1] switch
+ {
+ 'l' => new ListType(childField),
+ 'v' => new ListViewType(childField),
+ 'L' => new LargeListType(childField),
+ _ => throw new InvalidDataException($"Invalid format
for list: '{format}'"),
+ };
}
else if (format == "+s")
{
@@ -304,10 +310,10 @@ namespace Apache.Arrow.C
// Binary data
"z" => BinaryType.Default,
"vz" => BinaryViewType.Default,
- //"Z" => new LargeBinaryType() // Not yet implemented
+ "Z" => LargeBinaryType.Default,
"u" => StringType.Default,
"vu" => StringViewType.Default,
- //"U" => new LargeStringType(), // Not yet implemented
+ "U" => LargeStringType.Default,
// Date and time
"tdD" => Date32Type.Default,
"tdm" => Date64Type.Default,
diff --git a/csharp/src/Apache.Arrow/Ipc/ArrowReaderImplementation.cs
b/csharp/src/Apache.Arrow/Ipc/ArrowReaderImplementation.cs
index a37c501072..7e766677f8 100644
--- a/csharp/src/Apache.Arrow/Ipc/ArrowReaderImplementation.cs
+++ b/csharp/src/Apache.Arrow/Ipc/ArrowReaderImplementation.cs
@@ -291,6 +291,8 @@ namespace Apache.Arrow.Ipc
break;
case ArrowTypeId.String:
case ArrowTypeId.Binary:
+ case ArrowTypeId.LargeString:
+ case ArrowTypeId.LargeBinary:
case ArrowTypeId.ListView:
buffers = 3;
break;
diff --git a/csharp/src/Apache.Arrow/Ipc/ArrowStreamReaderImplementation.cs
b/csharp/src/Apache.Arrow/Ipc/ArrowStreamReaderImplementation.cs
index 5583a58487..12a2a17cf0 100644
--- a/csharp/src/Apache.Arrow/Ipc/ArrowStreamReaderImplementation.cs
+++ b/csharp/src/Apache.Arrow/Ipc/ArrowStreamReaderImplementation.cs
@@ -132,7 +132,13 @@ namespace Apache.Arrow.Ipc
Flatbuf.Message message =
Flatbuf.Message.GetRootAsMessage(CreateByteBuffer(messageBuff));
- int bodyLength = checked((int)message.BodyLength);
+ if (message.BodyLength > int.MaxValue)
+ {
+ throw new OverflowException(
+ $"Arrow IPC message body length ({message.BodyLength})
is larger than " +
+ $"the maximum supported message size
({int.MaxValue})");
+ }
+ int bodyLength = (int)message.BodyLength;
IMemoryOwner<byte> bodyBuffOwner =
_allocator.Allocate(bodyLength);
Memory<byte> bodyBuff = bodyBuffOwner.Memory.Slice(0,
bodyLength);
diff --git a/csharp/src/Apache.Arrow/Ipc/ArrowStreamWriter.cs
b/csharp/src/Apache.Arrow/Ipc/ArrowStreamWriter.cs
index c66569afeb..eaa8471fa7 100644
--- a/csharp/src/Apache.Arrow/Ipc/ArrowStreamWriter.cs
+++ b/csharp/src/Apache.Arrow/Ipc/ArrowStreamWriter.cs
@@ -57,11 +57,14 @@ namespace Apache.Arrow.Ipc
IArrowArrayVisitor<MonthDayNanosecondIntervalArray>,
IArrowArrayVisitor<ListArray>,
IArrowArrayVisitor<ListViewArray>,
+ IArrowArrayVisitor<LargeListArray>,
IArrowArrayVisitor<FixedSizeListArray>,
IArrowArrayVisitor<StringArray>,
IArrowArrayVisitor<StringViewArray>,
+ IArrowArrayVisitor<LargeStringArray>,
IArrowArrayVisitor<BinaryArray>,
IArrowArrayVisitor<BinaryViewArray>,
+ IArrowArrayVisitor<LargeBinaryArray>,
IArrowArrayVisitor<FixedSizeBinaryArray>,
IArrowArrayVisitor<StructArray>,
IArrowArrayVisitor<UnionArray>,
@@ -199,6 +202,28 @@ namespace Apache.Arrow.Ipc
VisitArray(values);
}
+ public void Visit(LargeListArray array)
+ {
+ _buffers.Add(CreateBitmapBuffer(array.NullBitmapBuffer,
array.Offset, array.Length));
+
_buffers.Add(CreateBuffer(GetZeroBasedLongValueOffsets(array.ValueOffsetsBuffer,
array.Offset, array.Length)));
+
+ int valuesOffset = 0;
+ int valuesLength = 0;
+ if (array.Length > 0)
+ {
+ valuesOffset = checked((int)array.ValueOffsets[0]);
+ valuesLength =
checked((int)array.ValueOffsets[array.Length] - valuesOffset);
+ }
+
+ var values = array.Values;
+ if (valuesOffset > 0 || valuesLength < values.Length)
+ {
+ values = ArrowArrayFactory.Slice(values, valuesOffset,
valuesLength);
+ }
+
+ VisitArray(values);
+ }
+
public void Visit(FixedSizeListArray array)
{
_buffers.Add(CreateBitmapBuffer(array.NullBitmapBuffer,
array.Offset, array.Length));
@@ -214,6 +239,8 @@ namespace Apache.Arrow.Ipc
public void Visit(StringViewArray array) => Visit(array as
BinaryViewArray);
+ public void Visit(LargeStringArray array) => Visit(array as
LargeBinaryArray);
+
public void Visit(BinaryArray array)
{
_buffers.Add(CreateBitmapBuffer(array.NullBitmapBuffer,
array.Offset, array.Length));
@@ -242,6 +269,22 @@ namespace Apache.Arrow.Ipc
VariadicCounts.Add(array.DataBufferCount);
}
+ public void Visit(LargeBinaryArray array)
+ {
+ _buffers.Add(CreateBitmapBuffer(array.NullBitmapBuffer,
array.Offset, array.Length));
+
_buffers.Add(CreateBuffer(GetZeroBasedLongValueOffsets(array.ValueOffsetsBuffer,
array.Offset, array.Length)));
+
+ int valuesOffset = 0;
+ int valuesLength = 0;
+ if (array.Length > 0)
+ {
+ valuesOffset = checked((int)array.ValueOffsets[0]);
+ valuesLength =
checked((int)array.ValueOffsets[array.Length]) - valuesOffset;
+ }
+
+ _buffers.Add(CreateSlicedBuffer<byte>(array.ValueBuffer,
valuesOffset, valuesLength));
+ }
+
public void Visit(FixedSizeBinaryArray array)
{
var itemSize =
((FixedSizeBinaryType)array.Data.DataType).ByteWidth;
@@ -327,6 +370,39 @@ namespace Apache.Arrow.Ipc
}
}
+ private ArrowBuffer GetZeroBasedLongValueOffsets(ArrowBuffer
valueOffsetsBuffer, int arrayOffset, int arrayLength)
+ {
+ var requiredBytes =
CalculatePaddedBufferLength(checked(sizeof(long) * (arrayLength + 1)));
+
+ if (arrayOffset != 0)
+ {
+ // Array has been sliced, so we need to shift and adjust
the offsets
+ var originalOffsets =
valueOffsetsBuffer.Span.CastTo<long>().Slice(arrayOffset, arrayLength + 1);
+ var firstOffset = arrayLength > 0 ? originalOffsets[0] :
0L;
+
+ var newValueOffsetsBuffer =
_allocator.Allocate(requiredBytes);
+ var newValueOffsets =
newValueOffsetsBuffer.Memory.Span.CastTo<long>();
+
+ for (int i = 0; i < arrayLength + 1; ++i)
+ {
+ newValueOffsets[i] = originalOffsets[i] - firstOffset;
+ }
+
+ return new ArrowBuffer(newValueOffsetsBuffer);
+ }
+ else if (valueOffsetsBuffer.Length > requiredBytes)
+ {
+ // Array may have been sliced but the offset is zero,
+ // so we can truncate the existing offsets
+ return new ArrowBuffer(valueOffsetsBuffer.Memory.Slice(0,
requiredBytes));
+ }
+ else
+ {
+ // Use the full buffer
+ return valueOffsetsBuffer;
+ }
+ }
+
private (ArrowBuffer Buffer, int minOffset, int maxEnd)
GetZeroBasedListViewOffsets(ListViewArray array)
{
if (array.Length == 0)
diff --git a/csharp/src/Apache.Arrow/Ipc/ArrowTypeFlatbufferBuilder.cs
b/csharp/src/Apache.Arrow/Ipc/ArrowTypeFlatbufferBuilder.cs
index 473e18968f..adc229a051 100644
--- a/csharp/src/Apache.Arrow/Ipc/ArrowTypeFlatbufferBuilder.cs
+++ b/csharp/src/Apache.Arrow/Ipc/ArrowTypeFlatbufferBuilder.cs
@@ -57,6 +57,7 @@ namespace Apache.Arrow.Ipc
IArrowTypeVisitor<DoubleType>,
IArrowTypeVisitor<StringType>,
IArrowTypeVisitor<StringViewType>,
+ IArrowTypeVisitor<LargeStringType>,
IArrowTypeVisitor<Date32Type>,
IArrowTypeVisitor<Date64Type>,
IArrowTypeVisitor<Time32Type>,
@@ -65,9 +66,11 @@ namespace Apache.Arrow.Ipc
IArrowTypeVisitor<IntervalType>,
IArrowTypeVisitor<BinaryType>,
IArrowTypeVisitor<BinaryViewType>,
+ IArrowTypeVisitor<LargeBinaryType>,
IArrowTypeVisitor<TimestampType>,
IArrowTypeVisitor<ListType>,
IArrowTypeVisitor<ListViewType>,
+ IArrowTypeVisitor<LargeListType>,
IArrowTypeVisitor<FixedSizeListType>,
IArrowTypeVisitor<UnionType>,
IArrowTypeVisitor<StructType>,
@@ -120,6 +123,14 @@ namespace Apache.Arrow.Ipc
Flatbuf.Type.BinaryView, offset);
}
+ public void Visit(LargeBinaryType type)
+ {
+ Flatbuf.LargeBinary.StartLargeBinary(Builder);
+ Offset<LargeBinary> offset =
Flatbuf.LargeBinary.EndLargeBinary(Builder);
+ Result = FieldType.Build(
+ Flatbuf.Type.LargeBinary, offset);
+ }
+
public void Visit(ListType type)
{
Flatbuf.List.StartList(Builder);
@@ -136,6 +147,14 @@ namespace Apache.Arrow.Ipc
Flatbuf.ListView.EndListView(Builder));
}
+ public void Visit(LargeListType type)
+ {
+ Flatbuf.LargeList.StartLargeList(Builder);
+ Result = FieldType.Build(
+ Flatbuf.Type.LargeList,
+ Flatbuf.LargeList.EndLargeList(Builder));
+ }
+
public void Visit(FixedSizeListType type)
{
Result = FieldType.Build(
@@ -166,6 +185,14 @@ namespace Apache.Arrow.Ipc
Flatbuf.Type.Utf8View, offset);
}
+ public void Visit(LargeStringType type)
+ {
+ Flatbuf.LargeUtf8.StartLargeUtf8(Builder);
+ Offset<LargeUtf8> offset =
Flatbuf.LargeUtf8.EndLargeUtf8(Builder);
+ Result = FieldType.Build(
+ Flatbuf.Type.LargeUtf8, offset);
+ }
+
public void Visit(TimestampType type)
{
StringOffset timezoneStringOffset = default;
@@ -363,7 +390,7 @@ namespace Apache.Arrow.Ipc
Types.IntervalUnit.DayTime => Flatbuf.IntervalUnit.DAY_TIME,
Types.IntervalUnit.MonthDayNanosecond =>
Flatbuf.IntervalUnit.MONTH_DAY_NANO,
_ => throw new ArgumentException($"unsupported interval unit
<{unit}>", nameof(unit))
- }; ;
+ };
}
}
}
diff --git a/csharp/src/Apache.Arrow/Ipc/MessageSerializer.cs
b/csharp/src/Apache.Arrow/Ipc/MessageSerializer.cs
index 0e6f330aef..8e15632c51 100644
--- a/csharp/src/Apache.Arrow/Ipc/MessageSerializer.cs
+++ b/csharp/src/Apache.Arrow/Ipc/MessageSerializer.cs
@@ -186,6 +186,8 @@ namespace Apache.Arrow.Ipc
return Types.StringType.Default;
case Flatbuf.Type.Utf8View:
return Types.StringViewType.Default;
+ case Flatbuf.Type.LargeUtf8:
+ return Types.LargeStringType.Default;
case Flatbuf.Type.FixedSizeBinary:
Flatbuf.FixedSizeBinary fixedSizeBinaryMetadata =
field.Type<Flatbuf.FixedSizeBinary>().Value;
return new
Types.FixedSizeBinaryType(fixedSizeBinaryMetadata.ByteWidth);
@@ -193,6 +195,8 @@ namespace Apache.Arrow.Ipc
return Types.BinaryType.Default;
case Flatbuf.Type.BinaryView:
return Types.BinaryViewType.Default;
+ case Flatbuf.Type.LargeBinary:
+ return Types.LargeBinaryType.Default;
case Flatbuf.Type.List:
if (childFields == null || childFields.Length != 1)
{
@@ -205,6 +209,12 @@ namespace Apache.Arrow.Ipc
throw new InvalidDataException($"List view type must
have exactly one child.");
}
return new Types.ListViewType(childFields[0]);
+ case Flatbuf.Type.LargeList:
+ if (childFields == null || childFields.Length != 1)
+ {
+ throw new InvalidDataException($"Large list type must
have exactly one child.");
+ }
+ return new Types.LargeListType(childFields[0]);
case Flatbuf.Type.FixedSizeList:
if (childFields == null || childFields.Length != 1)
{
diff --git a/csharp/src/Apache.Arrow/Types/IArrowType.cs
b/csharp/src/Apache.Arrow/Types/IArrowType.cs
index cf520391fe..7a3159a1bb 100644
--- a/csharp/src/Apache.Arrow/Types/IArrowType.cs
+++ b/csharp/src/Apache.Arrow/Types/IArrowType.cs
@@ -53,6 +53,9 @@ namespace Apache.Arrow.Types
BinaryView,
StringView,
ListView,
+ LargeList,
+ LargeBinary,
+ LargeString,
}
public interface IArrowType
diff --git a/csharp/src/Apache.Arrow/Types/LargeBinaryType.cs
b/csharp/src/Apache.Arrow/Types/LargeBinaryType.cs
new file mode 100644
index 0000000000..e22c333824
--- /dev/null
+++ b/csharp/src/Apache.Arrow/Types/LargeBinaryType.cs
@@ -0,0 +1,27 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements. See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+namespace Apache.Arrow.Types;
+
+public class LargeBinaryType: ArrowType
+{
+ public static readonly LargeBinaryType Default = new LargeBinaryType();
+
+ public override ArrowTypeId TypeId => ArrowTypeId.LargeBinary;
+
+ public override string Name => "large_binary";
+
+ public override void Accept(IArrowTypeVisitor visitor) => Accept(this,
visitor);
+}
diff --git a/csharp/src/Apache.Arrow/Types/IArrowType.cs
b/csharp/src/Apache.Arrow/Types/LargeListType.cs
similarity index 50%
copy from csharp/src/Apache.Arrow/Types/IArrowType.cs
copy to csharp/src/Apache.Arrow/Types/LargeListType.cs
index cf520391fe..2fe8166972 100644
--- a/csharp/src/Apache.Arrow/Types/IArrowType.cs
+++ b/csharp/src/Apache.Arrow/Types/LargeListType.cs
@@ -13,57 +13,24 @@
// See the License for the specific language governing permissions and
// limitations under the License.
-
namespace Apache.Arrow.Types
{
- public enum ArrowTypeId
+ public sealed class LargeListType : NestedType
{
- Null,
- Boolean,
- UInt8,
- Int8,
- UInt16,
- Int16,
- UInt32,
- Int32,
- UInt64,
- Int64,
- HalfFloat,
- Float,
- Double,
- String,
- Binary,
- FixedSizedBinary,
- Date32,
- Date64,
- Timestamp,
- Time32,
- Time64,
- Interval,
- Decimal128,
- Decimal256,
- List,
- Struct,
- Union,
- Dictionary,
- Map,
- FixedSizeList,
- Duration,
- RecordBatch,
- BinaryView,
- StringView,
- ListView,
- }
+ public override ArrowTypeId TypeId => ArrowTypeId.LargeList;
- public interface IArrowType
- {
- ArrowTypeId TypeId { get; }
+ public override string Name => "large_list";
+
+ public Field ValueField => Fields[0];
+
+ public IArrowType ValueDataType => Fields[0].DataType;
+
+ public LargeListType(Field valueField)
+ : base(valueField) { }
- string Name { get; }
-
- void Accept(IArrowTypeVisitor visitor);
+ public LargeListType(IArrowType valueDataType)
+ : this(new Field("item", valueDataType, true)) { }
- bool IsFixedWidth { get; }
-
+ public override void Accept(IArrowTypeVisitor visitor) => Accept(this,
visitor);
}
}
diff --git a/csharp/src/Apache.Arrow/Types/LargeStringType.cs
b/csharp/src/Apache.Arrow/Types/LargeStringType.cs
new file mode 100644
index 0000000000..8698ca4747
--- /dev/null
+++ b/csharp/src/Apache.Arrow/Types/LargeStringType.cs
@@ -0,0 +1,27 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements. See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+namespace Apache.Arrow.Types;
+
+public sealed class LargeStringType : ArrowType
+{
+ public static readonly LargeStringType Default = new LargeStringType();
+
+ public override ArrowTypeId TypeId => ArrowTypeId.LargeString;
+
+ public override string Name => "large_utf8";
+
+ public override void Accept(IArrowTypeVisitor visitor) => Accept(this,
visitor);
+}
diff --git a/csharp/test/Apache.Arrow.IntegrationTest/JsonFile.cs
b/csharp/test/Apache.Arrow.IntegrationTest/JsonFile.cs
index 7232f74b8b..c9e44b8d2f 100644
--- a/csharp/test/Apache.Arrow.IntegrationTest/JsonFile.cs
+++ b/csharp/test/Apache.Arrow.IntegrationTest/JsonFile.cs
@@ -177,8 +177,10 @@ namespace Apache.Arrow.IntegrationTest
"decimal" => ToDecimalArrowType(type),
"binary" => BinaryType.Default,
"binaryview" => BinaryViewType.Default,
+ "largebinary" => LargeBinaryType.Default,
"utf8" => StringType.Default,
"utf8view" => StringViewType.Default,
+ "largeutf8" => LargeStringType.Default,
"fixedsizebinary" => new FixedSizeBinaryType(type.ByteWidth),
"date" => ToDateArrowType(type),
"time" => ToTimeArrowType(type),
@@ -188,6 +190,7 @@ namespace Apache.Arrow.IntegrationTest
"timestamp" => ToTimestampArrowType(type),
"list" => ToListArrowType(type, children),
"listview" => ToListViewArrowType(type, children),
+ "largelist" => ToLargeListArrowType(type, children),
"fixedsizelist" => ToFixedSizeListArrowType(type, children),
"struct" => ToStructArrowType(type, children),
"union" => ToUnionArrowType(type, children),
@@ -303,6 +306,11 @@ namespace Apache.Arrow.IntegrationTest
return new ListViewType(children[0]);
}
+ private static IArrowType ToLargeListArrowType(JsonArrowType type,
Field[] children)
+ {
+ return new LargeListType(children[0]);
+ }
+
private static IArrowType ToFixedSizeListArrowType(JsonArrowType type,
Field[] children)
{
return new FixedSizeListType(children[0], type.ListSize);
@@ -461,11 +469,14 @@ namespace Apache.Arrow.IntegrationTest
IArrowTypeVisitor<TimestampType>,
IArrowTypeVisitor<StringType>,
IArrowTypeVisitor<StringViewType>,
+ IArrowTypeVisitor<LargeStringType>,
IArrowTypeVisitor<BinaryType>,
IArrowTypeVisitor<BinaryViewType>,
+ IArrowTypeVisitor<LargeBinaryType>,
IArrowTypeVisitor<FixedSizeBinaryType>,
IArrowTypeVisitor<ListType>,
IArrowTypeVisitor<ListViewType>,
+ IArrowTypeVisitor<LargeListType>,
IArrowTypeVisitor<FixedSizeListType>,
IArrowTypeVisitor<StructType>,
IArrowTypeVisitor<UnionType>,
@@ -696,6 +707,24 @@ namespace Apache.Arrow.IntegrationTest
Array = new StringViewArray(arrayData);
}
+ public void Visit(LargeStringType type)
+ {
+ ArrowBuffer validityBuffer = GetValidityBuffer(out int
nullCount);
+ ArrowBuffer offsetBuffer = GetLargeOffsetBuffer();
+
+ var json = JsonFieldData.Data.GetRawText();
+ string[] values = JsonSerializer.Deserialize<string[]>(json,
s_options);
+
+ ArrowBuffer.Builder<byte> valueBuilder = new
ArrowBuffer.Builder<byte>();
+ foreach (string value in values)
+ {
+ valueBuilder.Append(Encoding.UTF8.GetBytes(value));
+ }
+ ArrowBuffer valueBuffer = valueBuilder.Build(default);
+
+ Array = new LargeStringArray(JsonFieldData.Count,
offsetBuffer, valueBuffer, validityBuffer, nullCount);
+ }
+
public void Visit(BinaryType type)
{
ArrowBuffer validityBuffer = GetValidityBuffer(out int
nullCount);
@@ -747,6 +776,25 @@ namespace Apache.Arrow.IntegrationTest
Array = new BinaryViewArray(arrayData);
}
+ public void Visit(LargeBinaryType type)
+ {
+ ArrowBuffer validityBuffer = GetValidityBuffer(out int
nullCount);
+ ArrowBuffer offsetBuffer = GetLargeOffsetBuffer();
+
+ var json = JsonFieldData.Data.GetRawText();
+ string[] values = JsonSerializer.Deserialize<string[]>(json,
s_options);
+
+ ArrowBuffer.Builder<byte> valueBuilder = new
ArrowBuffer.Builder<byte>();
+ foreach (string value in values)
+ {
+ valueBuilder.Append(ConvertHexStringToByteArray(value));
+ }
+ ArrowBuffer valueBuffer = valueBuilder.Build(default);
+
+ ArrayData arrayData = new ArrayData(type, JsonFieldData.Count,
nullCount, 0, new[] { validityBuffer, offsetBuffer, valueBuffer });
+ Array = new LargeBinaryArray(arrayData);
+ }
+
public void Visit(FixedSizeBinaryType type)
{
ArrowBuffer validityBuffer = GetValidityBuffer(out int
nullCount);
@@ -796,6 +844,21 @@ namespace Apache.Arrow.IntegrationTest
Array = new ListViewArray(arrayData);
}
+ public void Visit(LargeListType type)
+ {
+ ArrowBuffer validityBuffer = GetValidityBuffer(out int
nullCount);
+ ArrowBuffer offsetBuffer = GetLargeOffsetBuffer();
+
+ var data = JsonFieldData;
+ JsonFieldData = data.Children[0];
+ type.ValueDataType.Accept(this);
+ JsonFieldData = data;
+
+ ArrayData arrayData = new ArrayData(type, JsonFieldData.Count,
nullCount, 0,
+ new[] { validityBuffer, offsetBuffer }, new[] { Array.Data
});
+ Array = new LargeListArray(arrayData);
+ }
+
public void Visit(FixedSizeListType type)
{
ArrowBuffer validityBuffer = GetValidityBuffer(out int
nullCount);
@@ -975,6 +1038,13 @@ namespace Apache.Arrow.IntegrationTest
return valueOffsets.Build(default);
}
+ private ArrowBuffer GetLargeOffsetBuffer()
+ {
+ ArrowBuffer.Builder<long> valueOffsets = new
ArrowBuffer.Builder<long>(JsonFieldData.Offset.Count);
+ valueOffsets.AppendRange(JsonFieldData.LongOffset);
+ return valueOffsets.Build(default);
+ }
+
private ArrowBuffer GetSizeBuffer()
{
ArrowBuffer.Builder<int> valueSizes = new
ArrowBuffer.Builder<int>(JsonFieldData.Size.Count);
@@ -1039,6 +1109,12 @@ namespace Apache.Arrow.IntegrationTest
get { return Offset.Select(GetInt); }
}
+ [JsonIgnore]
+ public IEnumerable<long> LongOffset
+ {
+ get { return Offset.Select(GetLong); }
+ }
+
[JsonIgnore]
public IEnumerable<int> IntSize
{
@@ -1056,6 +1132,18 @@ namespace Apache.Arrow.IntegrationTest
return int.Parse(node.GetValue<string>());
}
}
+
+ static long GetLong(JsonNode node)
+ {
+ try
+ {
+ return node.GetValue<long>();
+ }
+ catch
+ {
+ return long.Parse(node.GetValue<string>());
+ }
+ }
}
public class JsonView
diff --git a/csharp/test/Apache.Arrow.Tests/ArrowReaderVerifier.cs
b/csharp/test/Apache.Arrow.Tests/ArrowReaderVerifier.cs
index 5c33d1fd43..85f7b75f93 100644
--- a/csharp/test/Apache.Arrow.Tests/ArrowReaderVerifier.cs
+++ b/csharp/test/Apache.Arrow.Tests/ArrowReaderVerifier.cs
@@ -95,12 +95,15 @@ namespace Apache.Arrow.Tests
IArrowArrayVisitor<MonthDayNanosecondIntervalArray>,
IArrowArrayVisitor<ListArray>,
IArrowArrayVisitor<ListViewArray>,
+ IArrowArrayVisitor<LargeListArray>,
IArrowArrayVisitor<FixedSizeListArray>,
IArrowArrayVisitor<StringArray>,
IArrowArrayVisitor<StringViewArray>,
+ IArrowArrayVisitor<LargeStringArray>,
IArrowArrayVisitor<FixedSizeBinaryArray>,
IArrowArrayVisitor<BinaryArray>,
IArrowArrayVisitor<BinaryViewArray>,
+ IArrowArrayVisitor<LargeBinaryArray>,
IArrowArrayVisitor<StructArray>,
IArrowArrayVisitor<UnionArray>,
IArrowArrayVisitor<Decimal128Array>,
@@ -144,14 +147,17 @@ namespace Apache.Arrow.Tests
public void Visit(MonthDayNanosecondIntervalArray array) =>
CompareArrays(array);
public void Visit(ListArray array) => CompareArrays(array);
public void Visit(ListViewArray array) => CompareArrays(array);
+ public void Visit(LargeListArray array) => CompareArrays(array);
public void Visit(FixedSizeListArray array) =>
CompareArrays(array);
public void Visit(FixedSizeBinaryArray array) =>
CompareArrays(array);
public void Visit(Decimal128Array array) => CompareArrays(array);
public void Visit(Decimal256Array array) => CompareArrays(array);
public void Visit(StringArray array) =>
CompareBinaryArrays<StringArray>(array);
public void Visit(StringViewArray array) =>
CompareVariadicArrays<StringViewArray>(array);
+ public void Visit(LargeStringArray array) =>
CompareLargeBinaryArrays<LargeStringArray>(array);
public void Visit(BinaryArray array) =>
CompareBinaryArrays<BinaryArray>(array);
public void Visit(BinaryViewArray array) =>
CompareVariadicArrays<BinaryViewArray>(array);
+ public void Visit(LargeBinaryArray array) =>
CompareLargeBinaryArrays<LargeBinaryArray>(array);
public void Visit(StructArray array)
{
@@ -276,6 +282,40 @@ namespace Apache.Arrow.Tests
}
}
+ private void CompareLargeBinaryArrays<T>(LargeBinaryArray
actualArray)
+ where T : IArrowArray
+ {
+ Assert.IsAssignableFrom<T>(_expectedArray);
+ Assert.IsAssignableFrom<T>(actualArray);
+
+ var expectedArray = (LargeBinaryArray)_expectedArray;
+
+ actualArray.Data.DataType.Accept(_arrayTypeComparer);
+
+ Assert.Equal(expectedArray.Length, actualArray.Length);
+ Assert.Equal(expectedArray.NullCount, actualArray.NullCount);
+
+ CompareValidityBuffer(
+ expectedArray.NullCount, _expectedArray.Length,
expectedArray.NullBitmapBuffer,
+ expectedArray.Offset, actualArray.NullBitmapBuffer,
actualArray.Offset);
+
+ if (_strictCompare)
+ {
+ Assert.Equal(expectedArray.Offset, actualArray.Offset);
+
Assert.True(expectedArray.ValueOffsetsBuffer.Span.SequenceEqual(actualArray.ValueOffsetsBuffer.Span));
+ Assert.True(expectedArray.ValueBuffer.Span.Slice(0,
expectedArray.Length).SequenceEqual(actualArray.ValueBuffer.Span.Slice(0,
actualArray.Length)));
+ }
+ else
+ {
+ for (int i = 0; i < expectedArray.Length; i++)
+ {
+ Assert.True(
+
expectedArray.GetBytes(i).SequenceEqual(actualArray.GetBytes(i)),
+ $"LargeBinaryArray values do not match at index
{i}.");
+ }
+ }
+ }
+
private void CompareVariadicArrays<T>(BinaryViewArray actualArray)
where T : IArrowArray
{
@@ -469,6 +509,44 @@ namespace Apache.Arrow.Tests
}
}
+ private void CompareArrays(LargeListArray actualArray)
+ {
+ Assert.IsAssignableFrom<LargeListArray>(_expectedArray);
+ LargeListArray expectedArray = (LargeListArray)_expectedArray;
+
+ actualArray.Data.DataType.Accept(_arrayTypeComparer);
+
+ Assert.Equal(expectedArray.Length, actualArray.Length);
+ Assert.Equal(expectedArray.NullCount, actualArray.NullCount);
+
+ CompareValidityBuffer(
+ expectedArray.NullCount, _expectedArray.Length,
expectedArray.NullBitmapBuffer,
+ expectedArray.Offset, actualArray.NullBitmapBuffer,
actualArray.Offset);
+
+ if (_strictCompare)
+ {
+ Assert.Equal(expectedArray.Offset, actualArray.Offset);
+
Assert.True(expectedArray.ValueOffsetsBuffer.Span.SequenceEqual(actualArray.ValueOffsetsBuffer.Span));
+ actualArray.Values.Accept(new
ArrayComparer(expectedArray.Values, _strictCompare));
+ }
+ else
+ {
+ for (int i = 0; i < actualArray.Length; ++i)
+ {
+ if (expectedArray.IsNull(i))
+ {
+ Assert.True(actualArray.IsNull(i));
+ }
+ else
+ {
+ var expectedList =
expectedArray.GetSlicedValues(i);
+ var actualList = actualArray.GetSlicedValues(i);
+ actualList.Accept(new ArrayComparer(expectedList,
_strictCompare));
+ }
+ }
+ }
+ }
+
private void CompareArrays(FixedSizeListArray actualArray)
{
Assert.IsAssignableFrom<FixedSizeListArray>(_expectedArray);
diff --git a/csharp/test/Apache.Arrow.Tests/LargeBinaryArrayTests.cs
b/csharp/test/Apache.Arrow.Tests/LargeBinaryArrayTests.cs
new file mode 100644
index 0000000000..4ee1f1d0e0
--- /dev/null
+++ b/csharp/test/Apache.Arrow.Tests/LargeBinaryArrayTests.cs
@@ -0,0 +1,95 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements. See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+using System;
+using System.Collections.Generic;
+using Apache.Arrow.Types;
+using Xunit;
+
+namespace Apache.Arrow.Tests;
+
+public class LargeBinaryArrayTests
+{
+ [Fact]
+ public void GetBytesReturnsCorrectValue()
+ {
+ var byteArrays = new byte[][]
+ {
+ new byte[] {0, 1, 2, 255},
+ new byte[] {3, 4, 5},
+ new byte[] {},
+ null,
+ new byte[] {254, 253, 252},
+ };
+ var array = BuildArray(byteArrays);
+
+ Assert.Equal(array.Length, byteArrays.Length);
+ for (var i = 0; i < byteArrays.Length; ++i)
+ {
+ var byteSpan = array.GetBytes(i, out var isNull);
+ var byteArray = isNull ? null : byteSpan.ToArray();
+ Assert.Equal(byteArrays[i], byteArray);
+ }
+ }
+
+ [Fact]
+ public void GetBytesChecksForOffsetOverflow()
+ {
+ var valueBuffer = new ArrowBuffer.Builder<byte>();
+ var offsetBuffer = new ArrowBuffer.Builder<long>();
+ var validityBuffer = new ArrowBuffer.BitmapBuilder();
+
+ offsetBuffer.Append(0);
+ offsetBuffer.Append((long)int.MaxValue + 1);
+ validityBuffer.Append(true);
+
+ var array = new LargeBinaryArray(
+ LargeBinaryType.Default, length: 1,
+ offsetBuffer.Build(), valueBuffer.Build(), validityBuffer.Build(),
+ validityBuffer.UnsetBitCount);
+
+ Assert.Throws<OverflowException>(() => array.GetBytes(0));
+ }
+
+ private static LargeBinaryArray BuildArray(IReadOnlyCollection<byte[]>
byteArrays)
+ {
+ var valueBuffer = new ArrowBuffer.Builder<byte>();
+ var offsetBuffer = new ArrowBuffer.Builder<long>();
+ var validityBuffer = new ArrowBuffer.BitmapBuilder();
+
+ long offset = 0;
+ offsetBuffer.Append(offset);
+ foreach (var bytes in byteArrays)
+ {
+ if (bytes == null)
+ {
+ validityBuffer.Append(false);
+ offsetBuffer.Append(offset);
+ }
+ else
+ {
+ valueBuffer.Append(bytes);
+ offset += bytes.Length;
+ offsetBuffer.Append(offset);
+ validityBuffer.Append(true);
+ }
+ }
+
+ return new LargeBinaryArray(
+ LargeBinaryType.Default, byteArrays.Count,
+ offsetBuffer.Build(), valueBuffer.Build(), validityBuffer.Build(),
+ validityBuffer.UnsetBitCount);
+ }
+}
diff --git a/csharp/test/Apache.Arrow.Tests/LargeListArrayTests.cs
b/csharp/test/Apache.Arrow.Tests/LargeListArrayTests.cs
new file mode 100644
index 0000000000..1d35a8ffd6
--- /dev/null
+++ b/csharp/test/Apache.Arrow.Tests/LargeListArrayTests.cs
@@ -0,0 +1,105 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements. See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+using System;
+using System.Linq;
+using Apache.Arrow.Types;
+using Xunit;
+
+namespace Apache.Arrow.Tests;
+
+public class LargeListArrayTests
+{
+ [Fact]
+ public void GetSlicedValuesReturnsCorrectValues()
+ {
+ var values = new int?[][]
+ {
+ new int?[] {0, 1, 2},
+ System.Array.Empty<int?>(),
+ null,
+ new int?[] {3, 4, null, 6},
+ };
+
+ var array = BuildArray(values);
+
+ Assert.Equal(values.Length, array.Length);
+ for (int i = 0; i < values.Length; ++i)
+ {
+ Assert.Equal(values[i] == null, array.IsNull(i));
+ var arrayItem = (Int32Array) array.GetSlicedValues(i);
+ if (values[i] == null)
+ {
+ Assert.Null(arrayItem);
+ }
+ else
+ {
+ Assert.Equal(values[i], arrayItem.ToArray());
+ }
+ }
+ }
+
+ [Fact]
+ public void GetSlicedValuesChecksForOffsetOverflow()
+ {
+ var valuesArray = new Int32Array.Builder().Build();
+ var offsetBuffer = new ArrowBuffer.Builder<long>();
+ var validityBuffer = new ArrowBuffer.BitmapBuilder();
+
+ offsetBuffer.Append(0);
+ offsetBuffer.Append((long)int.MaxValue + 1);
+ validityBuffer.Append(true);
+
+ var array = new LargeListArray(
+ new LargeListType(new Int32Type()), length: 1,
+ offsetBuffer.Build(), valuesArray, validityBuffer.Build(),
+ validityBuffer.UnsetBitCount);
+
+ Assert.Throws<OverflowException>(() => array.GetSlicedValues(0));
+ }
+
+ private static LargeListArray BuildArray(int?[][] values)
+ {
+ var valuesBuilder = new Int32Array.Builder();
+ var offsetBuffer = new ArrowBuffer.Builder<long>();
+ var validityBuffer = new ArrowBuffer.BitmapBuilder();
+
+ long offset = 0;
+ offsetBuffer.Append(offset);
+ foreach (var listValue in values)
+ {
+ if (listValue == null)
+ {
+ validityBuffer.Append(false);
+ offsetBuffer.Append(offset);
+ }
+ else
+ {
+ foreach (var value in listValue)
+ {
+ valuesBuilder.Append(value);
+ }
+ offset += listValue.Length;
+ offsetBuffer.Append(offset);
+ validityBuffer.Append(true);
+ }
+ }
+
+ return new LargeListArray(
+ new LargeListType(new Int32Type()), values.Length,
+ offsetBuffer.Build(), valuesBuilder.Build(),
validityBuffer.Build(),
+ validityBuffer.UnsetBitCount);
+ }
+}
diff --git a/csharp/test/Apache.Arrow.Tests/LargeStringArrayTests.cs
b/csharp/test/Apache.Arrow.Tests/LargeStringArrayTests.cs
new file mode 100644
index 0000000000..aba97ba338
--- /dev/null
+++ b/csharp/test/Apache.Arrow.Tests/LargeStringArrayTests.cs
@@ -0,0 +1,91 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements. See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+using System;
+using System.Collections.Generic;
+using Xunit;
+
+namespace Apache.Arrow.Tests;
+
+public class LargeStringArrayTests
+{
+ [Fact]
+ public void GetStringReturnsCorrectValue()
+ {
+ var strings = new string[]
+ {
+ "abc",
+ "defg",
+ "",
+ null,
+ "123",
+ };
+ var array = BuildArray(strings);
+
+ Assert.Equal(array.Length, strings.Length);
+ for (var i = 0; i < strings.Length; ++i)
+ {
+ Assert.Equal(strings[i], array.GetString(i));
+ }
+ }
+
+ [Fact]
+ public void GetStringChecksForOffsetOverflow()
+ {
+ var valueBuffer = new ArrowBuffer.Builder<byte>();
+ var offsetBuffer = new ArrowBuffer.Builder<long>();
+ var validityBuffer = new ArrowBuffer.BitmapBuilder();
+
+ offsetBuffer.Append(0);
+ offsetBuffer.Append((long)int.MaxValue + 1);
+ validityBuffer.Append(true);
+
+ var array = new LargeStringArray(
+ length: 1, offsetBuffer.Build(), valueBuffer.Build(),
validityBuffer.Build(),
+ validityBuffer.UnsetBitCount);
+
+ Assert.Throws<OverflowException>(() => array.GetString(0));
+ }
+
+ private static LargeStringArray BuildArray(IReadOnlyCollection<string>
strings)
+ {
+ var valueBuffer = new ArrowBuffer.Builder<byte>();
+ var offsetBuffer = new ArrowBuffer.Builder<long>();
+ var validityBuffer = new ArrowBuffer.BitmapBuilder();
+
+ long offset = 0;
+ offsetBuffer.Append(offset);
+ foreach (var value in strings)
+ {
+ if (value == null)
+ {
+ validityBuffer.Append(false);
+ offsetBuffer.Append(offset);
+ }
+ else
+ {
+ var bytes = LargeStringArray.DefaultEncoding.GetBytes(value);
+ valueBuffer.Append(bytes);
+ offset += value.Length;
+ offsetBuffer.Append(offset);
+ validityBuffer.Append(true);
+ }
+ }
+
+ return new LargeStringArray(
+ strings.Count, offsetBuffer.Build(), valueBuffer.Build(),
validityBuffer.Build(),
+ validityBuffer.UnsetBitCount);
+ }
+}
diff --git a/csharp/test/Apache.Arrow.Tests/TableTests.cs
b/csharp/test/Apache.Arrow.Tests/TableTests.cs
index 83c88265d1..35fbe7cba6 100644
--- a/csharp/test/Apache.Arrow.Tests/TableTests.cs
+++ b/csharp/test/Apache.Arrow.Tests/TableTests.cs
@@ -63,9 +63,9 @@ namespace Apache.Arrow.Tests
Table table1 = Table.TableFromRecordBatches(recordBatch1.Schema,
recordBatches);
Assert.Equal(20, table1.RowCount);
#if NET5_0_OR_GREATER
- Assert.Equal(35, table1.ColumnCount);
+ Assert.Equal(38, table1.ColumnCount);
#else
- Assert.Equal(34, table1.ColumnCount);
+ Assert.Equal(37, table1.ColumnCount);
#endif
Assert.Equal("ChunkedArray: Length=20, DataType=list",
table1.Column(0).Data.ToString());
diff --git a/csharp/test/Apache.Arrow.Tests/TestData.cs
b/csharp/test/Apache.Arrow.Tests/TestData.cs
index 3ea42ee0fb..36969766ae 100644
--- a/csharp/test/Apache.Arrow.Tests/TestData.cs
+++ b/csharp/test/Apache.Arrow.Tests/TestData.cs
@@ -49,6 +49,7 @@ namespace Apache.Arrow.Tests
{
AddField(CreateField(new ListType(Int64Type.Default), i));
AddField(CreateField(new ListViewType(Int64Type.Default), i));
+ AddField(CreateField(new LargeListType(Int64Type.Default), i));
AddField(CreateField(BooleanType.Default, i));
AddField(CreateField(UInt8Type.Default, i));
AddField(CreateField(Int8Type.Default, i));
@@ -84,6 +85,8 @@ namespace Apache.Arrow.Tests
AddField(CreateField(new UnionType(new[] {
CreateField(StringType.Default, i), CreateField(Int32Type.Default, i) }, new[]
{ 0, 1 }, UnionMode.Sparse), i));
AddField(CreateField(new UnionType(new[] {
CreateField(StringType.Default, i), CreateField(Int32Type.Default, i) }, new[]
{ 0, 1 }, UnionMode.Dense), -i));
AddField(CreateField(new DictionaryType(Int32Type.Default,
StringType.Default, false), i));
+ AddField(CreateField(new LargeBinaryType(), i));
+ AddField(CreateField(new LargeStringType(), i));
}
Schema schema = builder.Build();
@@ -144,8 +147,10 @@ namespace Apache.Arrow.Tests
IArrowTypeVisitor<TimestampType>,
IArrowTypeVisitor<StringType>,
IArrowTypeVisitor<StringViewType>,
+ IArrowTypeVisitor<LargeStringType>,
IArrowTypeVisitor<ListType>,
IArrowTypeVisitor<ListViewType>,
+ IArrowTypeVisitor<LargeListType>,
IArrowTypeVisitor<FixedSizeListType>,
IArrowTypeVisitor<StructType>,
IArrowTypeVisitor<UnionType>,
@@ -154,6 +159,7 @@ namespace Apache.Arrow.Tests
IArrowTypeVisitor<DictionaryType>,
IArrowTypeVisitor<BinaryType>,
IArrowTypeVisitor<BinaryViewType>,
+ IArrowTypeVisitor<LargeBinaryType>,
IArrowTypeVisitor<FixedSizeBinaryType>,
IArrowTypeVisitor<MapType>,
IArrowTypeVisitor<IntervalType>,
@@ -335,6 +341,45 @@ namespace Apache.Arrow.Tests
Array = builder.Build();
}
+ public void Visit(LargeStringType type)
+ {
+ var str = "hello";
+ var valueBuffer = new ArrowBuffer.Builder<byte>();
+ var offsetBuffer = new ArrowBuffer.Builder<long>();
+ var validityBuffer = new ArrowBuffer.BitmapBuilder();
+
+ long offset = 0;
+ offsetBuffer.Append(offset);
+
+ for (var i = 0; i < Length; i++)
+ {
+ switch (i % 3)
+ {
+ case 0:
+ offsetBuffer.Append(offset);
+ validityBuffer.Append(false);
+ break;
+ case 1:
+
valueBuffer.Append(LargeStringArray.DefaultEncoding.GetBytes(str));
+ offset += str.Length;
+ offsetBuffer.Append(offset);
+ validityBuffer.Append(true);
+ break;
+ case 2:
+
valueBuffer.Append(LargeStringArray.DefaultEncoding.GetBytes(str + str));
+ offset += str.Length * 2;
+ offsetBuffer.Append(offset);
+ validityBuffer.Append(true);
+ break;
+ }
+ }
+
+ var validity = validityBuffer.UnsetBitCount > 0 ?
validityBuffer.Build() : ArrowBuffer.Empty;
+ Array = new LargeStringArray(
+ Length, offsetBuffer.Build(), valueBuffer.Build(),
validity,
+ validityBuffer.UnsetBitCount);
+ }
+
public void Visit(ListType type)
{
var builder = new
ListArray.Builder(type.ValueField).Reserve(Length);
@@ -379,6 +424,37 @@ namespace Apache.Arrow.Tests
Array = builder.Build();
}
+ public void Visit(LargeListType type)
+ {
+ var valueBuilder = new Int64Array.Builder().Reserve(Length * 3
/ 2);
+ var offsetBuffer = new ArrowBuffer.Builder<long>();
+ var validityBuffer = new ArrowBuffer.BitmapBuilder();
+
+ offsetBuffer.Append(0);
+
+ for (var i = 0; i < Length; i++)
+ {
+ if (i % 10 == 2)
+ {
+ offsetBuffer.Append(valueBuilder.Length);
+ validityBuffer.Append(false);
+ }
+ else
+ {
+ var listLength = i % 4;
+ valueBuilder.AppendRange(Enumerable.Range(i,
listLength).Select(x => (long)x));
+ offsetBuffer.Append(valueBuilder.Length);
+ validityBuffer.Append(true);
+ }
+ }
+
+ var validity = validityBuffer.UnsetBitCount > 0 ?
validityBuffer.Build() : ArrowBuffer.Empty;
+ Array = new LargeListArray(
+ new LargeListType(new Int64Type()), Length,
+ offsetBuffer.Build(), valueBuilder.Build(), validity,
+ validityBuffer.UnsetBitCount);
+ }
+
public void Visit(FixedSizeListType type)
{
var builder = new FixedSizeListArray.Builder(type.ValueField,
type.ListSize).Reserve(Length);
@@ -554,6 +630,48 @@ namespace Apache.Arrow.Tests
Array = builder.Build();
}
+ public void Visit(LargeBinaryType type)
+ {
+ ReadOnlySpan<byte> shortData = new[] { (byte)0, (byte)1,
(byte)2, (byte)3, (byte)4, (byte)5, (byte)6, (byte)7, (byte)8, (byte)9 };
+ ReadOnlySpan<byte> longData = new[]
+ {
+ (byte)0, (byte)1, (byte)2, (byte)3, (byte)4, (byte)5,
(byte)6, (byte)7, (byte)8, (byte)9,
+ (byte)10, (byte)11, (byte)12, (byte)13, (byte)14,
(byte)15, (byte)16, (byte)17, (byte)18, (byte)19
+ };
+ var valueBuffer = new ArrowBuffer.Builder<byte>();
+ var offsetBuffer = new ArrowBuffer.Builder<long>();
+ var validityBuffer = new ArrowBuffer.BitmapBuilder();
+
+ offsetBuffer.Append(0L);
+
+ for (var i = 0; i < Length; i++)
+ {
+ switch (i % 3)
+ {
+ case 0:
+ offsetBuffer.Append(valueBuffer.Length);
+ validityBuffer.Append(false);
+ break;
+ case 1:
+ valueBuffer.Append(shortData);
+ offsetBuffer.Append(valueBuffer.Length);
+ validityBuffer.Append(true);
+ break;
+ case 2:
+ valueBuffer.Append(longData);
+ offsetBuffer.Append(valueBuffer.Length);
+ validityBuffer.Append(true);
+ break;
+ }
+ }
+
+ var validity = validityBuffer.UnsetBitCount > 0 ?
validityBuffer.Build() : ArrowBuffer.Empty;
+ Array = new LargeBinaryArray(
+ LargeBinaryType.Default, Length,
+ offsetBuffer.Build(), valueBuffer.Build(), validity,
+ validityBuffer.UnsetBitCount);
+ }
+
public void Visit(FixedSizeBinaryType type)
{
ArrowBuffer.Builder<byte> valueBuilder = new
ArrowBuffer.Builder<byte>();
diff --git a/dev/archery/archery/integration/datagen.py
b/dev/archery/archery/integration/datagen.py
index b51f3d876f..47310c905a 100644
--- a/dev/archery/archery/integration/datagen.py
+++ b/dev/archery/archery/integration/datagen.py
@@ -1872,8 +1872,7 @@ def get_generated_json_files(tempdir=None):
generate_primitive_case([17, 20], name='primitive'),
generate_primitive_case([0, 0, 0], name='primitive_zerolength'),
- generate_primitive_large_offsets_case([17, 20])
- .skip_tester('C#'),
+ generate_primitive_large_offsets_case([17, 20]),
generate_null_case([10, 0]),
@@ -1906,7 +1905,6 @@ def get_generated_json_files(tempdir=None):
generate_recursive_nested_case(),
generate_nested_large_offsets_case()
- .skip_tester('C#')
.skip_tester('JS'),
generate_unions_case(),
diff --git a/docs/source/status.rst b/docs/source/status.rst
index 2663811756..c232aa280b 100644
--- a/docs/source/status.rst
+++ b/docs/source/status.rst
@@ -62,11 +62,11 @@ Data Types
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Binary | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓
| ✓ |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
-| Large Binary | ✓ | ✓ | ✓ | ✓ | | ✓ | ✓ |
| ✓ |
+| Large Binary | ✓ | ✓ | ✓ | ✓ | \(4) | ✓ | ✓ |
| ✓ |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Utf8 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓
| ✓ |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
-| Large Utf8 | ✓ | ✓ | ✓ | ✓ | | ✓ | ✓ |
| ✓ |
+| Large Utf8 | ✓ | ✓ | ✓ | ✓ | \(4) | ✓ | ✓ |
| ✓ |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| Binary View | ✓ | | ✓ | | ✓ | | |
| |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
@@ -85,7 +85,7 @@ Data Types
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| List | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| ✓ |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
-| Large List | ✓ | ✓ | ✓ | | | ✓ | ✓ |
| ✓ |
+| Large List | ✓ | ✓ | ✓ | | \(4) | ✓ | ✓ |
| ✓ |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
| List View | ✓ | | ✓ | | ✓ | | |
| |
+-------------------+-------+-------+-------+----+-------+-------+-------+-------+-----------+
@@ -125,6 +125,8 @@ Notes:
* \(1) Casting to/from Float16 in Java is not supported.
* \(2) Float16 support in C# is only available when targeting .NET 6+.
* \(3) Nested dictionaries not supported
+* \(4) C# large array types are provided to help with interoperability with
other libraries,
+ but these do not support buffers larger than 2 GiB and an exception will be
raised if trying to import an array that is too large.
.. seealso::
The :ref:`format_columnar` and the