[ https://issues.apache.org/jira/browse/ORC-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Taiyang Li updated ORC-1950: ---------------------------- Summary: [C++] Replace std::unorder_map with google dense_hash_map in SortedStringDictionary and remove reorder to improve write performance of dict-encoding columns (was: [C++] Replace std::unorder_map with google dense_hash_map as SortedStringDictionary and remove reorder to improve write performance of dict-encoding columns) > [C++] Replace std::unorder_map with google dense_hash_map in > SortedStringDictionary and remove reorder to improve write performance of > dict-encoding columns > ------------------------------------------------------------------------------------------------------------------------------------------------------------ > > Key: ORC-1950 > URL: https://issues.apache.org/jira/browse/ORC-1950 > Project: ORC > Issue Type: Bug > Reporter: Taiyang Li > Priority: Major > > Replace std::unorder_map with google dense_hash_map as SortedStringDictionary > and remove reorder to improve write performance of dict-encoding columns > > POC: > [https://github.com/bigo-sg/ClickHouse/commit/b9fc51fd8ded21f84f31cfa169350906b9f14456] > > > baseline: > {code:java} > 2025-07-08T16:22:54+08:00 > Running ./build_gcc/src/Common/benchmarks/orc_string_dictionary > Run on (96 X 2900 MHz CPU s) > CPU Caches: > L1 Data 32 KiB (x48) > L1 Instruction 32 KiB (x48) > L2 Unified 1024 KiB (x48) > L3 Unified 36608 KiB (x2) > Load Average: 27.44, 62.03, 43.39 > Benchmark Time > CPU Iterations > BM_writeStringDictionary<NewSortedStringDictionary, 10> 49801815 ns > 49800922 ns 11 > BM_writeStringDictionary<NewSortedStringDictionary, 100> 60295648 ns > 60294001 ns 12 > BM_writeStringDictionary<NewSortedStringDictionary, 1000> 73385081 ns > 73383192 ns 10 > BM_writeStringDictionary<NewSortedStringDictionary, 10000> 121725939 ns > 121642493 ns 6 > BM_writeStringDictionary<NewSortedStringDictionary, 100000> 232034759 ns > 232031059 ns 3 {code} > Opt1: > > -- This message was sent by Atlassian Jira (v8.20.10#820010)