[
https://issues.apache.org/jira/browse/KUDU-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905984#comment-15905984
]
Todd Lipcon commented on KUDU-1930:
-----------------------------------
Tested with commands like:
{code}
./build/latest/bin/tpch_real_world -tpch_path_to_ts_flags_file ./tsflags
-tpch_scaling_factor 100 -tpch_num_inserters 8 -notpch_run_queries
-tpch_path_to_dbgen_dir /data/2/mpercy/tpch_2_17_0/dbgen
-tpch_partition_strategy hash
{code}
(the new 'hash' partition strategy is from a simple local patch)
Results:
{code}
with 1 MM thread:
I0310 13:53:38.925390 1568 tpch_real_world.cc:278] Time spent by thread 2 to
load generated data into the database: real 1140.187s user 398.411s sys
5.315s
with 4 MM threads:
I0310 14:06:35.509299 8524 tpch_real_world.cc:278] Time spent by thread 4 to
load generated data into the database: real 618.348s user 413.431s sys
5.437s
with 4 MM threads, hash partition:
I0310 15:32:52.386118 27623 tpch_real_world.cc:289] Time spent by thread 2 to
load generated data into the database: real 1233.386s user 462.671s sys
6.084s
with 4 MM threads, using dense_hash_map instead of std::unordered_map for
dictionary builder:
I0310 17:26:00.682138 32076 tpch_real_world.cc:289] Time spent by thread 0 to
load generated data into the database: real 1147.478s user 464.147s sys
6.200s
{code}
The "user" times here are from the client side, so not that relevant, whereas
"real" is the total wall time taken. It seems like dense_hash_map is an easy 7%
speedup relative to the STL map. As we've long known, inserting in sorted order
(range partitioned) is 2x faster than non-sorted order (and the longer the
benchmark runs, the more the difference magnifies)
> Improve performance of dictionary builder
> -----------------------------------------
>
> Key: KUDU-1930
> URL: https://issues.apache.org/jira/browse/KUDU-1930
> Project: Kudu
> Issue Type: Bug
> Components: cfile, perf
> Affects Versions: 1.3.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
>
> I locally tweaked tpch_real_world to use hash partitioning instead of range
> partitioning, so that the different threads overlapped on the same tablets,
> simulating a more realistic parallel load scenario. I noticed that the MM
> threads were CPU bound, with a high percentage of CPU in AddCodeWords().
> Initial prototypes indicate that optimizing the hashmap used here would be an
> easy win.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)