This is an automated email from the ASF dual-hosted git repository.

curth pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
     new 5e60823aaa GH-44575: [C#] Replace LINQ expression with for loop 
(#44576)
5e60823aaa is described below

commit 5e60823aaa3403026811b936e902dec21456d793
Author: George Vanburgh <1670176+georgevanbu...@users.noreply.github.com>
AuthorDate: Wed Oct 30 00:08:53 2024 -0400

    GH-44575: [C#] Replace LINQ expression with for loop (#44576)
    
    For code which repeatedly access columns by name, this LINQ expression can 
form part of the hot path. This PR replaces the LINQ with the equivalent for 
loop, and should preserve all existing behaviour ([return -1 in the event of no 
match](https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1.indexof?view=net-8.0#system-collections-generic-list-1-indexof(-0))).
    
    I ran a quick benchmark to validate the speedup
    
    ```cs
    [MemoryDiagnoser]
    public class ColumnIndexerBenchmark
    {
        private readonly RecordBatch _batch;
    
        public ColumnIndexerBenchmark()
        {
            var builder = new Schema.Builder();
            builder
                .Field(new Field("A", Int32Type.Default, true))
                .Field(new Field("B", Int32Type.Default, true))
                .Field(new Field("C", Int32Type.Default, true))
                .Field(new Field("D", Int32Type.Default, true))
                .Field(new Field("E", Int32Type.Default, true))
                .Field(new Field("F", Int32Type.Default, true))
                .Field(new Field("G", Int32Type.Default, true))
                .Field(new Field("H", Int32Type.Default, true))
                .Field(new Field("I", Int32Type.Default, true))
                .Field(new Field("J", Int32Type.Default, true));
            var schema = builder.Build();
            _batch = new RecordBatch(schema, new 
IArrowArray[schema.FieldsList.Count], 0);
        }
    
        [Benchmark]
        public void GetColumnByIndex()
        {
            _batch.Column("H", StringComparer.Ordinal);
        }
    }
    
    ```
    
    Some numbers from my machine
    
    ```
    BenchmarkDotNet v0.14.0, Windows 10 (10.0.19045.5011/22H2/2022Update)
    13th Gen Intel Core i7-13800H, 1 CPU, 20 logical and 14 physical cores
    .NET SDK 8.0.306
      [Host]     : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
      DefaultJob : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
    ```
    
    | Method                  | Mean     | Error     | StdDev    | Gen0   | 
Allocated |
    |------------------------ 
|---------:|----------:|----------:|-------:|----------:|
    | GetColumnByIndexLinq    | 67.84 ns | 1.178 ns  | 1.102 ns  | 0.0107 |     
136 B |
    | GetColumnByIndexForLoop | 9.428 ns | 0.1334 ns | 0.1114 ns |      - |     
    - |
    
    In theory, we could achieve a greater speedup by maintaining a lookup of 
column names to ordinals. We already have several lookup structures inside 
`Schema`, but none of them provides access to ordinal values. However, the 
speedup from adding another mapping might not warrant adding yet another lookup 
structure to `Schema`.
    
    If merged, will close #44575.
    
    * GitHub Issue: #44575
    
    Authored-by: George Vanburgh <gvanbu...@bloomberg.net>
    Signed-off-by: Curt Hagenlocher <c...@hagenlocher.org>
---
 csharp/src/Apache.Arrow/Schema.cs | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/csharp/src/Apache.Arrow/Schema.cs 
b/csharp/src/Apache.Arrow/Schema.cs
index 4357e8b2dd..32615e5d67 100644
--- a/csharp/src/Apache.Arrow/Schema.cs
+++ b/csharp/src/Apache.Arrow/Schema.cs
@@ -82,7 +82,13 @@ namespace Apache.Arrow
         {
             comparer ??= StringComparer.CurrentCulture;
 
-            return _fieldsList.IndexOf(_fieldsList.First(x => 
comparer.Equals(x.Name, name)));
+            for (int i = 0; i < _fieldsList.Count; i++)
+            {
+                if (comparer.Equals(_fieldsList[i].Name, name))
+                    return i;
+            }
+
+            return -1;
         }
 
         public Schema RemoveField(int fieldIndex)

Reply via email to