[ https://issues.apache.org/jira/browse/KUDU-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Lipcon resolved KUDU-2844. ------------------------------- Fix Version/s: 1.13.0 Resolution: Fixed > Avoid copying strings from dictionary or plain-encoded blocks > ------------------------------------------------------------- > > Key: KUDU-2844 > URL: https://issues.apache.org/jira/browse/KUDU-2844 > Project: Kudu > Issue Type: Improvement > Components: cfile, perf > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Priority: Major > Fix For: 1.13.0 > > Attachments: fg.svg > > > When scanning a plain or dictionary-encoded binary column, we currently loop > over each entry and copy the string into the destination RowBlock's arena. In > TPCH Q1, the scanner threads use a significant percentage of CPU doing this > copying, and it also increases CPU cache footprint which likely decreases > performance in downstream operations like predicate evaluation, merging, > result serialization, etc. > Instead of doing this, we could "attach" the dictionary block (with > ref-counting) to the RowBlock and refer directly to the dictionary entry from > the RowBlock. When the RowBlock eventually is reset, we can drop the > reference. This should be safe because we never mutate indirect data in-place. -- This message was sent by Atlassian Jira (v8.3.4#803005)