[
https://issues.apache.org/jira/browse/KUDU-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17178067#comment-17178067
]
ASF subversion and git services commented on KUDU-2844:
-------------------------------------------------------
Commit 7b832c9c31975d691c4dc005631f1bb47aa6ed30 in kudu's branch
refs/heads/master from Todd Lipcon
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=7b832c9 ]
KUDU-2844 (1/3): make BlockHandle ref-counted
This is the first in a series of patches that will lead up to allowing a
BlockDecoder to take a reference to a data block and attach it to a
RowBlock when decoding rows, ensuring that the data block doesn't get
deallocated until the RowBlock has been serialized back to the client.
Note that the input to block decoders can be either references to blocks
in the block cache, or "owned" blocks which hold onto the memory
directly. As such, we need to ref-count the BlockHandle abstraction
rather than adding an additional reference to the already-ref-counted
BlockCacheHandle.
To accomplish this, this changes BlockHandle to be a heap-allocated
refcounted object. It also changes the various BlockDecoders to take in
a moved BlockHandle instead of just the Slice.
Change-Id: I1077fcc841ca31a2cb523769fffeed2d27782bc1
Reviewed-on: http://gerrit.cloudera.org:8080/15800
Reviewed-by: Andrew Wong <[email protected]>
Reviewed-by: Alexey Serbin <[email protected]>
Tested-by: Kudu Jenkins
> Avoid copying strings from dictionary or plain-encoded blocks
> -------------------------------------------------------------
>
> Key: KUDU-2844
> URL: https://issues.apache.org/jira/browse/KUDU-2844
> Project: Kudu
> Issue Type: Improvement
> Components: cfile, perf
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Major
> Attachments: fg.svg
>
>
> When scanning a plain or dictionary-encoded binary column, we currently loop
> over each entry and copy the string into the destination RowBlock's arena. In
> TPCH Q1, the scanner threads use a significant percentage of CPU doing this
> copying, and it also increases CPU cache footprint which likely decreases
> performance in downstream operations like predicate evaluation, merging,
> result serialization, etc.
> Instead of doing this, we could "attach" the dictionary block (with
> ref-counting) to the RowBlock and refer directly to the dictionary entry from
> the RowBlock. When the RowBlock eventually is reset, we can drop the
> reference. This should be safe because we never mutate indirect data in-place.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)