[jira] [Commented] (KUDU-1930) Improve performance of dictionary builder

Todd Lipcon (JIRA) Mon, 13 Mar 2017 22:29:46 -0700

    [ 
https://issues.apache.org/jira/browse/KUDU-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923600#comment-15923600
 ]


Todd Lipcon commented on KUDU-1930:
-----------------------------------

Another factor here: it seems that most of the calls into AddDictWords are 
passing only one value at a time -- perhaps the MRS compaction input is only 
yielding "blocks" of a single row, in which case we get really poor batching 
(and thus poor cache locality in the dict lookups, etc). This is a major source 
of cache misses based on perf counters.

> Improve performance of dictionary builder
> -----------------------------------------
>
>                 Key: KUDU-1930
>                 URL: https://issues.apache.org/jira/browse/KUDU-1930
>             Project: Kudu
>          Issue Type: Bug
>          Components: cfile, perf
>    Affects Versions: 1.3.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> I locally tweaked tpch_real_world to use hash partitioning instead of range 
> partitioning, so that the different threads overlapped on the same tablets, 
> simulating a more realistic parallel load scenario. I noticed that the MM 
> threads were CPU bound, with a high percentage of CPU in AddCodeWords(). 
> Initial prototypes indicate that optimizing the hashmap used here would be an 
> easy win.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KUDU-1930) Improve performance of dictionary builder

Reply via email to