[PR] GH-49069: [C++] Share Trie instances across CSV value decoders [arrow]

via GitHub Thu, 29 Jan 2026 20:42:21 -0800


HyukjinKwon opened a new pull request, #49070:
URL: https://github.com/apache/arrow/pull/49070


   ### Rationale for this change
   
   The CSV converter was building identical Trie data structures (for 
null/true/false values) in every decoder instance, causing duplicate memory 
allocation and initialization overhead.
   
   ### What changes are included in this PR?
   
   - Introduced `TrieCache` struct to hold shared Trie instances (null_trie, 
true_trie, false_trie)
   - Updated `ValueDecoder` and all decoder subclasses to accept and reference 
a shared `TrieCache` instead of building their own Tries
   - Updated `Converter` base class to create one `TrieCache` per converter and 
pass it to all decoders
   
   ### Are these changes tested?
   
   Yes, all existing tests. I ran a simple benchmark showing roughly 2-4% 
faster converter creation, and obviously less memory usage.
   
   ### Are there any user-facing changes?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] GH-49069: [C++] Share Trie instances across CSV value decoders [arrow]

Reply via email to