This is an automated email from the ASF dual-hosted git repository. curth pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/main by this push: new 5e60823aaa GH-44575: [C#] Replace LINQ expression with for loop (#44576) 5e60823aaa is described below commit 5e60823aaa3403026811b936e902dec21456d793 Author: George Vanburgh <1670176+georgevanbu...@users.noreply.github.com> AuthorDate: Wed Oct 30 00:08:53 2024 -0400 GH-44575: [C#] Replace LINQ expression with for loop (#44576) For code which repeatedly access columns by name, this LINQ expression can form part of the hot path. This PR replaces the LINQ with the equivalent for loop, and should preserve all existing behaviour ([return -1 in the event of no match](https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1.indexof?view=net-8.0#system-collections-generic-list-1-indexof(-0))). I ran a quick benchmark to validate the speedup ```cs [MemoryDiagnoser] public class ColumnIndexerBenchmark { private readonly RecordBatch _batch; public ColumnIndexerBenchmark() { var builder = new Schema.Builder(); builder .Field(new Field("A", Int32Type.Default, true)) .Field(new Field("B", Int32Type.Default, true)) .Field(new Field("C", Int32Type.Default, true)) .Field(new Field("D", Int32Type.Default, true)) .Field(new Field("E", Int32Type.Default, true)) .Field(new Field("F", Int32Type.Default, true)) .Field(new Field("G", Int32Type.Default, true)) .Field(new Field("H", Int32Type.Default, true)) .Field(new Field("I", Int32Type.Default, true)) .Field(new Field("J", Int32Type.Default, true)); var schema = builder.Build(); _batch = new RecordBatch(schema, new IArrowArray[schema.FieldsList.Count], 0); } [Benchmark] public void GetColumnByIndex() { _batch.Column("H", StringComparer.Ordinal); } } ``` Some numbers from my machine ``` BenchmarkDotNet v0.14.0, Windows 10 (10.0.19045.5011/22H2/2022Update) 13th Gen Intel Core i7-13800H, 1 CPU, 20 logical and 14 physical cores .NET SDK 8.0.306 [Host] : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2 DefaultJob : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2 ``` | Method | Mean | Error | StdDev | Gen0 | Allocated | |------------------------ |---------:|----------:|----------:|-------:|----------:| | GetColumnByIndexLinq | 67.84 ns | 1.178 ns | 1.102 ns | 0.0107 | 136 B | | GetColumnByIndexForLoop | 9.428 ns | 0.1334 ns | 0.1114 ns | - | - | In theory, we could achieve a greater speedup by maintaining a lookup of column names to ordinals. We already have several lookup structures inside `Schema`, but none of them provides access to ordinal values. However, the speedup from adding another mapping might not warrant adding yet another lookup structure to `Schema`. If merged, will close #44575. * GitHub Issue: #44575 Authored-by: George Vanburgh <gvanbu...@bloomberg.net> Signed-off-by: Curt Hagenlocher <c...@hagenlocher.org> --- csharp/src/Apache.Arrow/Schema.cs | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/csharp/src/Apache.Arrow/Schema.cs b/csharp/src/Apache.Arrow/Schema.cs index 4357e8b2dd..32615e5d67 100644 --- a/csharp/src/Apache.Arrow/Schema.cs +++ b/csharp/src/Apache.Arrow/Schema.cs @@ -82,7 +82,13 @@ namespace Apache.Arrow { comparer ??= StringComparer.CurrentCulture; - return _fieldsList.IndexOf(_fieldsList.First(x => comparer.Equals(x.Name, name))); + for (int i = 0; i < _fieldsList.Count; i++) + { + if (comparer.Equals(_fieldsList[i].Name, name)) + return i; + } + + return -1; } public Schema RemoveField(int fieldIndex)