[ https://issues.apache.org/jira/browse/KUDU-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Lipcon updated KUDU-2844: ------------------------------ Attachment: fg.svg > Avoid copying strings from dictionary or plain-encoded blocks > ------------------------------------------------------------- > > Key: KUDU-2844 > URL: https://issues.apache.org/jira/browse/KUDU-2844 > Project: Kudu > Issue Type: Improvement > Components: cfile, perf > Reporter: Todd Lipcon > Priority: Major > Attachments: fg.svg > > > When scanning a plain or dictionary-encoded binary column, we currently loop > over each entry and copy the string into the destination RowBlock's arena. In > TPCH Q1, the scanner threads use a significant percentage of CPU doing this > copying, and it also increases CPU cache footprint which likely decreases > performance in downstream operations like predicate evaluation, merging, > result serialization, etc. > Instead of doing this, we could "attach" the dictionary block (with > ref-counting) to the RowBlock and refer directly to the dictionary entry from > the RowBlock. When the RowBlock eventually is reset, we can drop the > reference. This should be safe because we never mutate indirect data in-place. -- This message was sent by Atlassian JIRA (v7.6.3#76005)