[
https://issues.apache.org/jira/browse/KUDU-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon reassigned KUDU-2844:
---------------------------------
Assignee: Todd Lipcon
> Avoid copying strings from dictionary or plain-encoded blocks
> -------------------------------------------------------------
>
> Key: KUDU-2844
> URL: https://issues.apache.org/jira/browse/KUDU-2844
> Project: Kudu
> Issue Type: Improvement
> Components: cfile, perf
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Major
> Attachments: fg.svg
>
>
> When scanning a plain or dictionary-encoded binary column, we currently loop
> over each entry and copy the string into the destination RowBlock's arena. In
> TPCH Q1, the scanner threads use a significant percentage of CPU doing this
> copying, and it also increases CPU cache footprint which likely decreases
> performance in downstream operations like predicate evaluation, merging,
> result serialization, etc.
> Instead of doing this, we could "attach" the dictionary block (with
> ref-counting) to the RowBlock and refer directly to the dictionary entry from
> the RowBlock. When the RowBlock eventually is reset, we can drop the
> reference. This should be safe because we never mutate indirect data in-place.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)