Istvan, I think we should also involve dev@hbase and see what guidelines we
are recommending so far for coprocs that would like to implement timeout
features for long running scans, wdyt?

On Wed, Nov 19, 2025 at 6:51 PM Viraj Jasani <[email protected]> wrote:

> Thank you for starting this thread, Istvan!
>
> This is an important issue. I have recently come across data correctness
> issues with PHOENIX-7733, to be fixed by HBASE-29722. This also got me
> thinking about the heartbeat and dummy cell overlap leading to possible
> data correctness issues.
>
> > I propose dropping the dummy cell mechanics from Phoenix, and using the
> > HBase keepalive/cursor mechanics instead (we may not even need the
> cursors).
>
> +1
>
> > If we cannot find a better way to shortcut some processing in Phoenix we
> > may need to keep dummy cells internally, but we have to make sure that
> they
> > never appear on the wire and reach the client.
>
> I don't think it is possible for Phoenix to ensure a dummy cell never
> reaches the HBase client.
>
> > in that case we'd need
> > to check and convert to a heartbeat scan result somehow
>
> This needs changes in HBase only, which I don't think HBase would (should)
> allow.
>
> > Is Hbase 2/3 wire compatible enough that connecting with HBase 2.x
> clients
> > to Hbase 3 even a possibility ?
>
> Yes, wire compatibility is important. When this happens, the only thing we
> can do is set the page timeout high enough that we never have to send the
> dummy result to the client, or disable the paging feature.
>
>
> On Thu, Nov 13, 2025 at 11:22 PM Istvan Toth <[email protected]> wrote:
>
>> I've been struggling with errors on the region moving tests on my HBase
>> 3.0
>> WIP branch and have finally tracked the problems down to Phoenix's dummy
>> Cells (as well as some built-in assumptions in Phoenix which are not true
>> for Hbase 3, see PHOENIX-7728
>> <https://issues.apache.org/jira/browse/PHOENIX-7728>)
>>
>> HBase is not aware that these are dummy cells, and is considering the rows
>> as already processed when retrying scans after the region goes away from
>> under the scan, i.e. it restarts the scan from AFTER the dummy cell's
>> rowkey, leading to the scan skipping rows.
>>
>> I have been able to fix the tests by hacking Hbase to ignore these dummy
>> cells (and fixing the phoenix side problems described in PHOENIX-7728
>> <https://issues.apache.org/jira/browse/PHOENIX-7728>), but I don't think
>> that hacking HBase to work with dummy cells is the way to go (or even if
>> that would be accepted by HBase).
>>
>> AFAIU the dummy cells were added back in the HBase 1.x when there was no
>> other way to ensure timely responses from the server.
>>
>> HBase 2 has introduced the keepalive/cursor mechanics, which IUC serves
>> the
>> exact same purpose at the Phoenix dummy cells.
>>
>> I propose dropping the dummy cell mechanics from Phoenix, and using the
>> HBase keepalive/cursor mechanics instead (we may not even need the
>> cursors).
>>
>> If we cannot find a better way to shortcut some processing in Phoenix we
>> may need to keep dummy cells internally, but we have to make sure that
>> they
>> never appear on the wire and reach the client. (i.e. in that case we'd
>> need
>> to check and convert to a heartbeat scan result somehow)
>>
>> We will also need to consider backwards compatibility.
>>
>> Is Hbase 2/3 wire compatible enough that connecting with HBase 2.x clients
>> to Hbase 3 even a possibility ?
>>
>> Do we want to support that ?
>>
>> When using Hbase 2.x, if Phoenix starts to use the HBase keepalive
>> mechanics, will old clients work with that without changes, or do we need
>> to keep sending Dummy cells for older clients ?
>>
>> Looking forward to hearing your take,
>>
>> Istvan
>>
>

Reply via email to