georgevanburgh opened a new pull request, #44576:
URL: https://github.com/apache/arrow/pull/44576

   For code which repeatedly access columns by name, this LINQ expression can 
form part of the hot path. This PR replaces the LINQ with the equivalent for 
loop.
   
   I ran a quick benchmark to validate the speedup
   
   ```cs
   [MemoryDiagnoser]
   public class ColumnIndexerBenchmark
   {
       private readonly RecordBatch _batch;
   
       public ColumnIndexerBenchmark()
       {
           var builder = new Schema.Builder();
           builder
               .Field(new Field("A", Int32Type.Default, true))
               .Field(new Field("B", Int32Type.Default, true))
               .Field(new Field("C", Int32Type.Default, true))
               .Field(new Field("D", Int32Type.Default, true))
               .Field(new Field("E", Int32Type.Default, true))
               .Field(new Field("F", Int32Type.Default, true))
               .Field(new Field("G", Int32Type.Default, true))
               .Field(new Field("H", Int32Type.Default, true))
               .Field(new Field("I", Int32Type.Default, true))
               .Field(new Field("J", Int32Type.Default, true));
           var schema = builder.Build();
           _batch = new RecordBatch(schema, new 
IArrowArray[schema.FieldsList.Count], 0);
       }
   
       [Benchmark]
       public void GetColumnByIndex()
       {
           _batch.Column("H", StringComparer.Ordinal);
       }
   }
   
   ```
   
   Some numbers from my machine
   
   ```
   BenchmarkDotNet v0.14.0, Windows 10 (10.0.19045.5011/22H2/2022Update)
   13th Gen Intel Core i7-13800H, 1 CPU, 20 logical and 14 physical cores
   .NET SDK 8.0.306
     [Host]     : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
     DefaultJob : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
   ```
   
   | Method                  | Mean     | Error     | StdDev    | Gen0   | 
Allocated |
   |------------------------ 
|---------:|----------:|----------:|-------:|----------:|
   | GetColumnByIndexLinq    | 67.84 ns | 1.178 ns  | 1.102 ns  | 0.0107 |     
136 B |
   | GetColumnByIndexForLoop | 9.428 ns | 0.1334 ns | 0.1114 ns |      - |      
   - |
   
   This method should already be covered by existing tests.
   
   If merged, will close #44575.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to