We also need to understand: what happens when hbase client gets heartbeat and the region moves?
On Wed, Nov 19, 2025 at 7:05 PM Viraj Jasani <[email protected]> wrote: > Istvan, I think we should also involve dev@hbase and see what guidelines > we are recommending so far for coprocs that would like to implement timeout > features for long running scans, wdyt? > > On Wed, Nov 19, 2025 at 6:51 PM Viraj Jasani <[email protected]> wrote: > >> Thank you for starting this thread, Istvan! >> >> This is an important issue. I have recently come across data correctness >> issues with PHOENIX-7733, to be fixed by HBASE-29722. This also got me >> thinking about the heartbeat and dummy cell overlap leading to possible >> data correctness issues. >> >> > I propose dropping the dummy cell mechanics from Phoenix, and using the >> > HBase keepalive/cursor mechanics instead (we may not even need the >> cursors). >> >> +1 >> >> > If we cannot find a better way to shortcut some processing in Phoenix we >> > may need to keep dummy cells internally, but we have to make sure that >> they >> > never appear on the wire and reach the client. >> >> I don't think it is possible for Phoenix to ensure a dummy cell never >> reaches the HBase client. >> >> > in that case we'd need >> > to check and convert to a heartbeat scan result somehow >> >> This needs changes in HBase only, which I don't think HBase would >> (should) allow. >> >> > Is Hbase 2/3 wire compatible enough that connecting with HBase 2.x >> clients >> > to Hbase 3 even a possibility ? >> >> Yes, wire compatibility is important. When this happens, the only thing >> we can do is set the page timeout high enough that we never have to send >> the dummy result to the client, or disable the paging feature. >> >> >> On Thu, Nov 13, 2025 at 11:22 PM Istvan Toth <[email protected]> wrote: >> >>> I've been struggling with errors on the region moving tests on my HBase >>> 3.0 >>> WIP branch and have finally tracked the problems down to Phoenix's dummy >>> Cells (as well as some built-in assumptions in Phoenix which are not true >>> for Hbase 3, see PHOENIX-7728 >>> <https://issues.apache.org/jira/browse/PHOENIX-7728>) >>> >>> HBase is not aware that these are dummy cells, and is considering the >>> rows >>> as already processed when retrying scans after the region goes away from >>> under the scan, i.e. it restarts the scan from AFTER the dummy cell's >>> rowkey, leading to the scan skipping rows. >>> >>> I have been able to fix the tests by hacking Hbase to ignore these dummy >>> cells (and fixing the phoenix side problems described in PHOENIX-7728 >>> <https://issues.apache.org/jira/browse/PHOENIX-7728>), but I don't think >>> that hacking HBase to work with dummy cells is the way to go (or even if >>> that would be accepted by HBase). >>> >>> AFAIU the dummy cells were added back in the HBase 1.x when there was no >>> other way to ensure timely responses from the server. >>> >>> HBase 2 has introduced the keepalive/cursor mechanics, which IUC serves >>> the >>> exact same purpose at the Phoenix dummy cells. >>> >>> I propose dropping the dummy cell mechanics from Phoenix, and using the >>> HBase keepalive/cursor mechanics instead (we may not even need the >>> cursors). >>> >>> If we cannot find a better way to shortcut some processing in Phoenix we >>> may need to keep dummy cells internally, but we have to make sure that >>> they >>> never appear on the wire and reach the client. (i.e. in that case we'd >>> need >>> to check and convert to a heartbeat scan result somehow) >>> >>> We will also need to consider backwards compatibility. >>> >>> Is Hbase 2/3 wire compatible enough that connecting with HBase 2.x >>> clients >>> to Hbase 3 even a possibility ? >>> >>> Do we want to support that ? >>> >>> When using Hbase 2.x, if Phoenix starts to use the HBase keepalive >>> mechanics, will old clients work with that without changes, or do we need >>> to keep sending Dummy cells for older clients ? >>> >>> Looking forward to hearing your take, >>> >>> Istvan >>> >>
