[jira] [Commented] (PHOENIX-1103) Remove hash join special case for ChunkedResultIterator

Gabriel Reid (JIRA) Mon, 21 Jul 2014 22:23:26 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069839#comment-14069839
 ]


Gabriel Reid commented on PHOENIX-1103:
---------------------------------------

Thanks for taking a look [~jamestaylor].

{quote}I may have missed it, but don't you need to not add a zero byte to the 
last key, because you want to process that row again{quote}

The {{getLastKey()}} method currently returns the last key that was actually 
consumed by the iterator, not the last key that was seen, so it's still 
necessary to add a null byte on to the end of the return value of 
{{getLastKey()}} in order to restart the can on the next chunk.

{quote}Also minor, but consider making the two ImmutableBytesWritable pointers 
member variables so you don't reallocated again and again{quote}

Good point. I didn't think this through too much on the original patch, and at 
first the idea of making these member variables seems to make a lot of sense, 
but digging into it a bit more makes me wonder if it is or not. The fact is 
that the {{rowKeyChanged}} method is only called once a chunk's size has been 
reached, so in all cases except for hash joins it will only be called once per 
chunk. In the case of hash joins it will be called one or several times per 
chunk, but also probably a small number of times. Adding those two 
ImmutableBytesWritables as member variables will push them into survivor GC 
space, which I guess will have its own minute overhead as well (although also 
very minor). I have the feeling that I'm really over-thinking this now, but I 
was wondering if you think it's still worth adding them as member variables 
considering they'll typically only be allocated once per instance.



> Remove hash join special case for ChunkedResultIterator
> -------------------------------------------------------
>
>                 Key: PHOENIX-1103
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1103
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Gabriel Reid
>            Assignee: Gabriel Reid
>             Fix For: 5.0.0, 3.1, 4.1
>
>         Attachments: PHOENIX-1103.patch
>
>
> This is a follow-up issue to PHOENIX-539. There is currently an special case 
> which disables the ChunkedResultIterator in the case of a hash join. This 
> disabling of the ChunkedResultIterator is needed due to the fact that a hash 
> join scan can return multiple rows with the same row key.
> As discussed in the comments of PHOENIX-539, the ChunkedResultIterator should 
> be updated to only end a chunk at between different row keys.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PHOENIX-1103) Remove hash join special case for ChunkedResultIterator

Reply via email to