[
https://issues.apache.org/jira/browse/PHOENIX-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069839#comment-14069839
]
Gabriel Reid commented on PHOENIX-1103:
---------------------------------------
Thanks for taking a look [~jamestaylor].
{quote}I may have missed it, but don't you need to not add a zero byte to the
last key, because you want to process that row again{quote}
The {{getLastKey()}} method currently returns the last key that was actually
consumed by the iterator, not the last key that was seen, so it's still
necessary to add a null byte on to the end of the return value of
{{getLastKey()}} in order to restart the can on the next chunk.
{quote}Also minor, but consider making the two ImmutableBytesWritable pointers
member variables so you don't reallocated again and again{quote}
Good point. I didn't think this through too much on the original patch, and at
first the idea of making these member variables seems to make a lot of sense,
but digging into it a bit more makes me wonder if it is or not. The fact is
that the {{rowKeyChanged}} method is only called once a chunk's size has been
reached, so in all cases except for hash joins it will only be called once per
chunk. In the case of hash joins it will be called one or several times per
chunk, but also probably a small number of times. Adding those two
ImmutableBytesWritables as member variables will push them into survivor GC
space, which I guess will have its own minute overhead as well (although also
very minor). I have the feeling that I'm really over-thinking this now, but I
was wondering if you think it's still worth adding them as member variables
considering they'll typically only be allocated once per instance.
> Remove hash join special case for ChunkedResultIterator
> -------------------------------------------------------
>
> Key: PHOENIX-1103
> URL: https://issues.apache.org/jira/browse/PHOENIX-1103
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Gabriel Reid
> Assignee: Gabriel Reid
> Fix For: 5.0.0, 3.1, 4.1
>
> Attachments: PHOENIX-1103.patch
>
>
> This is a follow-up issue to PHOENIX-539. There is currently an special case
> which disables the ChunkedResultIterator in the case of a hash join. This
> disabling of the ChunkedResultIterator is needed due to the fact that a hash
> join scan can return multiple rows with the same row key.
> As discussed in the comments of PHOENIX-539, the ChunkedResultIterator should
> be updated to only end a chunk at between different row keys.
--
This message was sent by Atlassian JIRA
(v6.2#6252)