CurtHagenlocher commented on code in PR #284: URL: https://github.com/apache/arrow-dotnet/pull/284#discussion_r2935976145
########## src/Apache.Arrow.Serialization/README.md: ########## @@ -0,0 +1,637 @@ +<!--- +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to You under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +# Apache.Arrow.Serialization + +Source-generated [Apache Arrow](https://arrow.apache.org/) serialization for .NET. + +Mark any type with `[ArrowSerializable]` and a Roslyn source generator will emit compile-time Arrow schema derivation, serialization, and deserialization — zero reflection on the hot path, fully AOT-compatible. + +```csharp +[ArrowSerializable] +public partial record Person +{ + public string Name { get; init; } = ""; + public int Age { get; init; } +} + +// Single-row +var batch = Person.ToRecordBatch(new Person { Name = "Alice", Age = 30 }); +var alice = Person.FromRecordBatch(batch); + +// Multi-row +var people = new[] { alice, new Person { Name = "Bob", Age = 25 } }; +var table = Person.ToRecordBatch(people); +IReadOnlyList<Person> restored = Person.ListFromRecordBatch(table); + +// Arrow IPC bytes (cross-language compatible) +byte[] bytes = alice.SerializeToBytes(); +var roundTrip = ArrowSerializerExtensions.DeserializeFromBytes<Person>(bytes); +``` + +## Table of Contents + +- [Installation](#installation) +- [Quick Start](#quick-start) +- [Supported Types](#supported-types) + - [Type Declarations](#type-declarations) + - [Built-in Type Mappings](#built-in-type-mappings) + - [Collections](#collections) + - [Nullable Types](#nullable-types) +- [Attributes](#attributes) + - [ArrowSerializable](#arrowserializable) + - [ArrowField](#arrowfield) + - [ArrowType](#arrowtype) + - [ArrowIgnore](#arrowignore) + - [ArrowMetadata](#arrowmetadata) +- [Nested Types](#nested-types) +- [Readonly Fields and Constructors](#readonly-fields-and-constructors) +- [Enum Serialization](#enum-serialization) +- [Polymorphism](#polymorphism) +- [Custom Converters](#custom-converters) +- [Serialization Callbacks](#serialization-callbacks) +- [JSON Schema Emission](#json-schema-emission) +- [RecordBatchBuilder (Reflection-Based)](#recordbatchbuilder-reflection-based) +- [Extension Methods](#extension-methods) +- [Source Generator Diagnostics](#source-generator-diagnostics) +- [Cross-Language Compatibility](#cross-language-compatibility) + +## Installation + +``` +dotnet add package Apache.Arrow.Serialization +``` + +The NuGet package includes both the runtime library and the Roslyn source generator. Targets `net8.0`. + +## Quick Start + +1. Add `[ArrowSerializable]` to your type +2. Make the type `partial` (required for source generation) +3. The generator emits `IArrowSerializer<T>` — giving you `ArrowSchema`, `ToRecordBatch`, `FromRecordBatch`, and `ListFromRecordBatch` + +```csharp +using Apache.Arrow.Serialization; + +[ArrowSerializable] +public partial record SensorReading +{ + public string SensorId { get; init; } = ""; + public double Temperature { get; init; } + public DateTime Timestamp { get; init; } +} +``` + +The source generator produces a `partial` implementation with these static members: + +```csharp +partial record SensorReading : IArrowSerializer<SensorReading> +{ + public static Schema ArrowSchema { get; } + public static RecordBatch ToRecordBatch(SensorReading value); + public static SensorReading FromRecordBatch(RecordBatch batch); + public static RecordBatch ToRecordBatch(IReadOnlyList<SensorReading> values); + public static IReadOnlyList<SensorReading> ListFromRecordBatch(RecordBatch batch); +} +``` + +## Supported Types + +### Type Declarations + +All four C# type kinds are supported: + +```csharp +[ArrowSerializable] +public partial record MyRecord { ... } + +[ArrowSerializable] +public partial record struct MyRecordStruct { ... } + +[ArrowSerializable] +public partial class MyClass { ... } + +[ArrowSerializable] +public partial struct MyStruct { ... } +``` + +Records use `{ get; init; }` properties. Classes and structs use `{ get; set; }`. + +### Built-in Type Mappings + +| C# Type | Arrow Type | Notes | +|---------|-----------|-------| +| `string` | `Utf8` | Override to `StringView` via `[ArrowType("string_view")]` | +| `bool` | `Boolean` | Override to `Bool8` via `[ArrowType("bool8")]` | +| `byte` | `UInt8` | | +| `sbyte` | `Int8` | | +| `short` | `Int16` | | +| `ushort` | `UInt16` | | +| `int` | `Int32` | | +| `uint` | `UInt32` | | +| `long` | `Int64` | | +| `ulong` | `UInt64` | | +| `float` | `Float32` | | +| `double` | `Float64` | | +| `Half` | `Float16` | | +| `decimal` | `Decimal128(38, 18)` | Configurable via `[ArrowType("decimal128(28, 10)")]` | Review Comment: It might be worth pointing out in the documentation that a CLR decimal is not a perfect match for an Arrow decimal. ########## Apache.Arrow.sln: ########## @@ -29,64 +29,214 @@ Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Apache.Arrow.Flight.Integra EndProject Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Apache.Arrow.IntegrationTest", "test\Apache.Arrow.IntegrationTest\Apache.Arrow.IntegrationTest.csproj", "{E8264B7F-B680-4A55-939B-85DB628164BB}" EndProject +Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Apache.Arrow.Serialization", "src\Apache.Arrow.Serialization\Apache.Arrow.Serialization.csproj", "{E0C418BE-DD55-4FB1-973E-272B142BAA9E}" +EndProject +Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Apache.Arrow.Serialization.Generator", "src\Apache.Arrow.Serialization.Generator\Apache.Arrow.Serialization.Generator.csproj", "{FD8B13D7-16F4-4DBF-BB25-13EA5131EE03}" +EndProject +Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Apache.Arrow.Serialization.Tests", "test\Apache.Arrow.Serialization.Tests\Apache.Arrow.Serialization.Tests.csproj", "{3726633C-7093-40A1-8ABB-13A5CD64033A}" +EndProject Global GlobalSection(SolutionConfigurationPlatforms) = preSolution Debug|Any CPU = Debug|Any CPU + Debug|x64 = Debug|x64 Review Comment: It would be nice to avoid adding all these targets. Is there something bitness-specific in these changes? ########## test/Apache.Arrow.Serialization.Tests/DiagnosticTests.cs: ########## @@ -0,0 +1,255 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +using System.Collections.Immutable; +using Microsoft.CodeAnalysis; +using Microsoft.CodeAnalysis.CSharp; +using Apache.Arrow.Serialization.Generator; +using Xunit; + +namespace Apache.Arrow.Serialization.Tests; + +public class DiagnosticTests Review Comment: Nice tests! ########## src/Apache.Arrow.Serialization/RecordBatchBuilder.cs: ########## @@ -0,0 +1,711 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +using System.Diagnostics.CodeAnalysis; +using System.Reflection; +using Apache.Arrow; +using Apache.Arrow.Arrays; +using Apache.Arrow.Types; + +namespace Apache.Arrow.Serialization; + +/// <summary> +/// Reflection-based serializer for converting arbitrary .NET objects (including anonymous types) +/// to Arrow RecordBatches. Analogous to System.Text.Json's reflection-based path — +/// works without attributes or source generation but is not AOT-safe. +/// </summary> +public static class RecordBatchBuilder +{ + /// <summary> + /// Convert a collection of objects to a RecordBatch. Schema is inferred from the + /// public readable properties of <typeparamref name="T"/>. + /// Works with anonymous types, records, classes, and structs. + /// </summary> + [RequiresUnreferencedCode("Uses reflection to inspect properties. Use [ArrowSerializable] for AOT-safe serialization.")] + public static RecordBatch FromObjects<T>(IEnumerable<T> items) + { + var list = items as IReadOnlyList<T> ?? items.ToList(); + if (list.Count == 0) + throw new ArgumentException("Cannot infer schema from empty collection.", nameof(items)); Review Comment: Is this really true though? We use the type to infer the schema, not the data. It would be annoying for someone to have to special case an empty list if they want to serialize it. ########## src/Apache.Arrow.Serialization/Apache.Arrow.Serialization.csproj: ########## @@ -0,0 +1,20 @@ +<Project Sdk="Microsoft.NET.Sdk"> + + <PropertyGroup> + <TargetFramework>net8.0</TargetFramework> Review Comment: What would it take to make this work for .NET 4.7.2? Is that even plausible? ########## src/Apache.Arrow.Serialization/RecordBatchBuilder.cs: ########## @@ -0,0 +1,711 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +using System.Diagnostics.CodeAnalysis; +using System.Reflection; +using Apache.Arrow; +using Apache.Arrow.Arrays; +using Apache.Arrow.Types; + +namespace Apache.Arrow.Serialization; + +/// <summary> +/// Reflection-based serializer for converting arbitrary .NET objects (including anonymous types) +/// to Arrow RecordBatches. Analogous to System.Text.Json's reflection-based path — +/// works without attributes or source generation but is not AOT-safe. +/// </summary> +public static class RecordBatchBuilder +{ + /// <summary> + /// Convert a collection of objects to a RecordBatch. Schema is inferred from the + /// public readable properties of <typeparamref name="T"/>. + /// Works with anonymous types, records, classes, and structs. + /// </summary> + [RequiresUnreferencedCode("Uses reflection to inspect properties. Use [ArrowSerializable] for AOT-safe serialization.")] + public static RecordBatch FromObjects<T>(IEnumerable<T> items) + { + var list = items as IReadOnlyList<T> ?? items.ToList(); + if (list.Count == 0) + throw new ArgumentException("Cannot infer schema from empty collection.", nameof(items)); + + var properties = typeof(T).GetProperties(BindingFlags.Public | BindingFlags.Instance) + .Where(p => p.CanRead) + .ToArray(); + + var fields = new List<Field>(); + var builders = new List<IColumnBuilder>(); + + foreach (var prop in properties) + { + var propType = prop.PropertyType; + var (arrowType, nullable) = InferArrowType(propType); + fields.Add(new Field(prop.Name, arrowType, nullable)); + builders.Add(CreateColumnBuilder(propType, arrowType)); + } + + var schema = new Schema.Builder(); Review Comment: Consider moving `schema` above the `foreach` and adding the fields directly into the schema builder instead of a temporary list. ########## src/Apache.Arrow.Serialization.Generator/Models.cs: ########## @@ -0,0 +1,151 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +using System.Collections.Generic; + +#nullable enable + +namespace Apache.Arrow.Serialization.Generator +{ + internal enum TypeKind2 Review Comment: Consider a more descriptive name. What about `ArrowTypeKind`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
