[
https://issues.apache.org/jira/browse/PHOENIX-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087995#comment-16087995
]
Sergey Soldatov commented on PHOENIX-4018:
------------------------------------------
[[email protected]] That's interesting problem. I think that none of our inner
scanners should use scanner context (i.e. use NoLimit one), but the ones that
actually return something should properly update the progress. So, basically it
should looks like:
any of our scanners newRow(List<Cell> cells, ContextScanner context) should
call innerScanner.nextRow(cells). On return it should update the context
process.
In this case RSRpcServices will be able correctly watch for the results in term
of size/time. At the same time since inner scanners are not using limits there
will be no chance to receive a partial result.
Is there anything that I missed?
> HashJoin may produce nulls for LHS table columns
> ------------------------------------------------
>
> Key: PHOENIX-4018
> URL: https://issues.apache.org/jira/browse/PHOENIX-4018
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.11.0
> Reporter: Sergey Soldatov
> Assignee: Sergey Soldatov
> Priority: Critical
> Attachments: PHOENIX-4018-1.patch
>
>
> Here is the problem: in HashJoinRegionScanner methods (nextRow for example)
> we are using the same scanner context that was created in RSRpcServices. It
> has limits (i.e. 2Mb size). Let's say that we have 3Mb region and the only
> key that match the join condition is located at the end of the region. In
> HashJoinRegionScanner#nextRow when we iterate through the region rows once we
> reached the limit of 2Mb, every region scanner nextRow will return a single
> cell and the scanner context will have SIZE_LIMIT_REACHED_MID_ROW state. But
> we don't have any logic that check that, so this single cell is considered as
> a complete row with all nulls except one column.
> How to fix it:
> 1. for region scanner we may provide NoLimitScannerContext, so we will never
> get a partial result.
> 2. We need to update the scanner context that we got from RSRpcServices with
> the real data, basing on the size of results we are going to return.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)