luisquintanilla opened a new issue, #375:
URL: https://github.com/apache/arrow-dotnet/issues/375

   ### Describe the enhancement requested
   
   ## Summary
   
   Implement a small, well-tested compute/kernels library that uses 
System.Numerics.Tensors (TensorPrimitives and Tensor<T>/TensorSpan<T>) to 
provide fast aggregation and elementwise kernels for Arrow PrimitiveArray<T>. 
This converts the repo's documented TODO ("no compute / kernel abstraction") 
into a working, multi-targeted, SIMD-accelerated compute layer while remaining 
safe on older frameworks.
   
   ## Goals
   
   - Provide production-quality Sum/Min/Max/Mean aggregation kernels for 
int32/int64/float/double (generic where net8+ allows).  
   - Provide elementwise Add/Subtract/Multiply/Divide kernels (scaffolded) that 
produce new PrimitiveArray<T> results.  
   - Implement a simple Arrow Tensor wrapper that can be constructed from 
Flatbuf Tensor messages and exposed as TensorSpan<T>/Tensor<T> on net8+ 
(wrapping existing buffers without copying where safe).  
   - Add unit tests and benchmarks.  
   - Keep changes additive, behind #if NET8_0_OR_GREATER where necessary, and 
preserve scalar fallbacks for netstandard2.0/net462.
   
   ## Current State
   
   - PrimitiveArray<T> already exposes ReadOnlySpan<T> Values (no marshaling 
needed) and there is an existing MemoryAllocator to allocate result buffers.  
   - System.Numerics.Tensors offers span-based SIMD-accelerated primitives 
(TensorPrimitives) that match Arrow's memory model.  
   - This issue defines exact file additions/edits, code skeletons, tests, and 
a PR checklist so a junior dev can implement and close it.
   
   ## Scope / Acceptance criteria
   
   1. New project: src/Apache.Arrow.Compute added and referenced from 
solutions/projects where appropriate.  
   2. Directory.Packages.props updated to include System.Numerics.Tensors 
v10.0.9 (or later stable).  
   3. Aggregations: Add Sum/Min/Max/Mean for int, long, float, double.  
      - Fast path: when array.NullCount == 0 call TensorPrimitives.* on Values. 
 
      - Null-aware path: correct scalar fallback implementation that respects 
validity bitmap and NullCount > 0.  
   4. Elementwise: scaffold Add/Subtract/Multiply/Divide APIs producing a new 
PrimitiveArray<T> value buffer and combined validity bitmap.  
   5. Arrow Tensor: Add a small ArrowTensor wrapper class that parses Flatbuf 
tensor metadata and provides an API to convert to TensorSpan<T> on net8+ 
without copying (where memory/strides allow).  
   6. Unit tests in test/Apache.Arrow.Compute.Tests covering correctness 
(nulls, empty, single-element, large arrays).  
   7. Benchmark project benchmarks/Apache.Arrow.Compute.Benchmarks using 
BenchmarkDotNet that compares scalar vs. TensorPrimitives fast path for large 
arrays.  
   8. Documentation: README.md updated to list Compute support and sample usage 
code.  
   9. CI: make sure dotnet test passes for net8.0 and netstandard2.0 target 
frameworks locally.
   
   ## Files to add / modify (exact, shovel-ready)
   
   1) Directory.Packages.props (modify)
   
   Add inside <ItemGroup> alongside other package versions:
   
   <PackageVersion Include="System.Numerics.Tensors" Version="10.0.9" />
   
   (If repo policy requires newer, update to latest stable; v10.x is stable.)
   
   2) New project: src/Apache.Arrow.Compute/Apache.Arrow.Compute.csproj
   
   Create a new project file with these contents:
   
   ```xml
   <Project Sdk="Microsoft.NET.Sdk">
     <PropertyGroup>
       <AllowUnsafeBlocks>true</AllowUnsafeBlocks>
       <TargetFrameworks>netstandard2.0;net8.0;net462</TargetFrameworks>
       <Description>Arrow compute kernels: aggregations and elementwise ops 
backed by System.Numerics.Tensors</Description>
     </PropertyGroup>
   
     <ItemGroup>
       <PackageReference Include="System.Numerics.Tensors" />
     </ItemGroup>
   
     <ItemGroup>
       <ProjectReference Include="..\Apache.Arrow\Apache.Arrow.csproj" />
     </ItemGroup>
   </Project>
   ```
   
   3) Source: src/Apache.Arrow.Compute/Aggregations.cs
   
   Create this file (complete, copy-paste friendly). It contains net8-optimized 
generic implementations and scalar fallbacks for older TFMs.
   
   ```csharp
   using System;
   using System.Numerics.Tensors;
   using Apache.Arrow;
   
   namespace Apache.Arrow.Compute
   {
       public static class Aggregations
       {
   #if NET8_0_OR_GREATER
           public static T Sum<T>(this PrimitiveArray<T> array)
               where T : unmanaged, System.Numerics.INumber<T>
           {
               ReadOnlySpan<T> values = array.Values;
               if (array.NullCount == 0)
               {
                   // Fast SIMD path
                   return TensorPrimitives.Sum(values);
               }
   
               // Null-aware fallback: scalar loop respecting validity bitmap
               T acc = T.Zero;
               for (int i = 0; i < values.Length; i++)
               {
                   if (array.IsValid(i))
                       acc += values[i];
               }
               return acc;
           }
   
           public static T Min<T>(this PrimitiveArray<T> array)
               where T : unmanaged, System.Numerics.INumber<T>
           {
               ReadOnlySpan<T> values = array.Values;
               if (array.NullCount == 0)
               {
                   return TensorPrimitives.Min(values);
               }
               bool set = false;
               T min = T.Zero;
               for (int i = 0; i < values.Length; i++)
               {
                   if (!array.IsValid(i))
                       continue;
                   if (!set) { min = values[i]; set = true; }
                   else if (values[i] < min) min = values[i];
               }
               if (!set) throw new InvalidOperationException("Sequence contains 
no elements");
               return min;
           }
   
           public static T Max<T>(this PrimitiveArray<T> array)
               where T : unmanaged, System.Numerics.INumber<T>
           {
               ReadOnlySpan<T> values = array.Values;
               if (array.NullCount == 0)
               {
                   return TensorPrimitives.Max(values);
               }
               bool set = false;
               T max = T.Zero;
               for (int i = 0; i < values.Length; i++)
               {
                   if (!array.IsValid(i))
                       continue;
                   if (!set) { max = values[i]; set = true; }
                   else if (values[i] > max) max = values[i];
               }
               if (!set) throw new InvalidOperationException("Sequence contains 
no elements");
               return max;
           }
   
           public static double Mean<T>(this PrimitiveArray<T> array)
               where T : unmanaged, System.Numerics.INumber<T>
           {
               long count = array.Length - array.NullCount;
               if (count == 0) throw new InvalidOperationException("Sequence 
contains no elements");
               T sum = array.Sum<T>();
               // Convert to double for stable mean across numeric types
               return Convert.ToDouble(sum);
           }
   #else
           // Fallbacks for older TFMs: explicit overloads for common types
           public static int Sum(this PrimitiveArray<int> array)
           {
               var values = array.Values;
               if (array.NullCount == 0)
               {
                   int acc = 0;
                   for (int i = 0; i < values.Length; i++) acc += values[i];
                   return acc;
               }
               int res = 0;
               for (int i = 0; i < values.Length; i++) if (array.IsValid(i)) 
res += values[i];
               return res;
           }
   
           // TODO: add other scalar overloads (long, float, double) following 
same pattern.
   #endif
       }
   }
   ```
   
   Notes on the implementation above (must be followed):
   - Fast path uses TensorPrimitives when NullCount == 0. TensorPrimitives 
methods are span-based and yield SIMD acceleration.
   - Null-aware paths are scalar and correct; optimizing them (mask-copy to a 
pooled buffer then vectorized operation) is a follow-up task (Phase 2).
   - Keep all public APIs in namespace Apache.Arrow.Compute.
   
   4) Source: src/Apache.Arrow.Compute/Elementwise.cs (scaffold)
   
   Add skeleton methods for Add/Subtract/Multiply/Divide (net8+ generic 
TensorPrimitives.* overloads and scalar fallback). Include combining validity 
bitmaps using bitwise AND and creating result ArrowBuffer via 
MemoryAllocator.Default.
   
   5) Tensor wrapper: src/Apache.Arrow.Compute/ArrowTensor.cs
   
   Create a small wrapper with these responsibilities:
   - Parse Flatbuf Tensor messages (use existing Flatbuf types in 
src/Apache.Arrow/Flatbuf).
   - Expose Shape and Strides and a Typed view API: TryGetTensorSpan<T>(out 
TensorSpan<T>) or ToTensor<T>() on net8+.
   - Ensure zero-copy where underlying ArrowBuffer memory alignment and type 
match; otherwise provide a documented copy path.
   
   Provide an initial minimal implementation that works for contiguous dense 
row-major tensors with element-size matching primitive ArrowBuffers.
   
   6) Tests: test/Apache.Arrow.Compute.Tests/AggregationsTests.cs
   
   Add unit tests for:
   - Sum_Int32_NoNulls: sum values {1,2,3,4} == 10
   - Sum_Int32_WithNulls: values {1,null,3} == 4
   - Min/Max/Mean for float/double
   - Empty and all-null cases assert expected exceptions or behaviors
   
   Use xUnit and follow existing repo test patterns (see other tests under 
test/ folder).
   
   7) Benchmarks: benchmarks/ComputeBenchmarks/ComputeBenchmarks.cs
   
   Add BenchmarkDotNet job that compares Sum implementation: scalar loop vs. 
TensorPrimitives Sum for N = 10K, 100K, 1M elements and for types float and 
double. This shows speedups on modern hardware.
   
   8) Documentation updates
   
   - README.md: add a "Compute" section with usage examples for Sum and 
elementwise Add. Provide code snippet showing how to call: `long s = 
myIntArray.Sum();` or `double mean = myDoubleArray.Mean();` and how to convert 
an Arrow Tensor to TensorSpan<T>.
   - docs: add a short page under docs/ describing compute design and null 
semantics.
   
   Detailed implementation notes & gotchas (do not skip)
   
   - Null semantics: TensorPrimitives has no validity concept. For correctness, 
either:
     - (EASY) Use scalar null-aware path as implemented above when NullCount > 
0.
     - (FAST) Mask into a pooled (ArrayPool) buffer of T with nulls replaced by 
identity and run TensorPrimitives on the buffer. This reduces branching but 
allocates; to be implemented later. Provide helper methods to Rent/Return 
buffers.
   
   - Generic Math / INumber<T> requires net7/8 features; guard generics with 
#if NET8_0_OR_GREATER and provide typed scalar fallbacks for older TFMs.
   
   - AOT and trimming: avoid reflection for numeric conversions and avoid 
method bodies that rely on JIT-only assumptions. Keep code simple and attribute 
NativeAOT concerns to CI if needed.
   
   - Byte-order and alignment: TensorPrimitives operates on spans of T; ensure 
the ArrowBuffer Span.CastTo<T>() uses correct element size and is 
little-endian. The repo currently asserts little-endian in Flatbuffers helpers; 
consult Table.__vector_as_span implementation for guidance.
   
   - Tests must include cross-target verification, e.g., run tests for net8.0 
and netstandard2.0 locally to ensure symbolic fallbacks work.
   
   PR checklist (step-by-step commands for the contributor)
   
   1. Create a branch off main: git checkout -b feature/compute-tensorprimitives
   2. Edit Directory.Packages.props: add System.Numerics.Tensors package 
version.
   3. Add project folder src/Apache.Arrow.Compute with csproj and source files 
above.
   4. Add tests to test/Apache.Arrow.Compute.Tests and add project references.
   5. Add benchmarks folder and a small benchmark class.
   6. Add new projects to the root solution file if desired: dotnet sln add 
src/Apache.Arrow.Compute/Apache.Arrow.Compute.csproj
   7. Build and run tests: dotnet build && dotnet test --framework net8.0 && 
dotnet test --framework netstandard2.0
   8. Run the benchmark locally: dotnet run -c Release --project 
benchmarks/ComputeBenchmarks
   9. Open PR, include benchmark results, and link to tests and new docs.
   
   Estimated effort and risk
   
   - Estimated time: 2–3 days for a junior dev to complete Phase 1 
(aggregations + tests + docs) with guidance and code skeletons provided here.  
   - Risk: low to moderate. Multi-targeting pitfalls and generic math 
availability are the main sources of friction. Null-handling performance 
optimizations are optional for this ticket — correctness first.
   
   Follow-on tasks (not part of this issue)
   
   - Implement masked vectorized null-handling to eliminate scalar fallback 
allocations.  
   - Implement elementwise ops fully and add broadcasting support where 
applicable.  
   - Integrate Tensor<T> operator-based expressions on newer runtimes (net10+) 
when available.  
   - Add compute kernel registration/discovery API to allow higher-level 
compute engines to pick implementations.
   
   Acceptance sign-off
   
   - All unit tests pass on net8.0 and netstandard2.0.  
   - Benchmarks show the TensorPrimitives fast path is faster than scalar loop 
for large arrays (include results in PR).  
   - README updated with usage examples.  
   - PR reviewed and merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to