[ 
https://issues.apache.org/jira/browse/ORC-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Taiyang Li updated ORC-1950:
----------------------------
    Summary: [C++] Replace std::unorder_map with google dense_hash_map in 
SortedStringDictionary and remove reorder to improve write performance of 
dict-encoding columns  (was: [C++] Replace std::unorder_map with google 
dense_hash_map as SortedStringDictionary and remove reorder to improve write 
performance of dict-encoding columns)

> [C++] Replace std::unorder_map with google dense_hash_map in 
> SortedStringDictionary and remove reorder to improve write performance of 
> dict-encoding columns
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ORC-1950
>                 URL: https://issues.apache.org/jira/browse/ORC-1950
>             Project: ORC
>          Issue Type: Bug
>            Reporter: Taiyang Li
>            Priority: Major
>
> Replace std::unorder_map with google dense_hash_map as SortedStringDictionary 
> and remove reorder to improve write performance of dict-encoding columns 
>  
> POC: 
> [https://github.com/bigo-sg/ClickHouse/commit/b9fc51fd8ded21f84f31cfa169350906b9f14456]
>  
>  
> baseline: 
> {code:java}
> 2025-07-08T16:22:54+08:00
> Running ./build_gcc/src/Common/benchmarks/orc_string_dictionary
> Run on (96 X 2900 MHz CPU s)
> CPU Caches:
> L1 Data 32 KiB (x48)
> L1 Instruction 32 KiB (x48)
> L2 Unified 1024 KiB (x48)
> L3 Unified 36608 KiB (x2)
> Load Average: 27.44, 62.03, 43.39
> Benchmark                                                            Time     
>         CPU   Iterations
> BM_writeStringDictionary<NewSortedStringDictionary, 10>       49801815 ns     
> 49800922 ns           11
> BM_writeStringDictionary<NewSortedStringDictionary, 100>      60295648 ns     
> 60294001 ns           12
> BM_writeStringDictionary<NewSortedStringDictionary, 1000>     73385081 ns     
> 73383192 ns           10
> BM_writeStringDictionary<NewSortedStringDictionary, 10000>   121725939 ns    
> 121642493 ns            6
> BM_writeStringDictionary<NewSortedStringDictionary, 100000>  232034759 ns    
> 232031059 ns            3 {code}
> Opt1: 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to