Thanks for your thoughtful response, Viraj. I have added my thoughts below.
On Wed, Nov 19, 2025 at 2:38 PM Viraj Jasani <[email protected]> wrote: > We also need to understand: what happens when hbase client gets heartbeat > and the region moves? > > I have checked that code in HBase, and the HBase client seems to handle this case transparently. We may of course find bugs, but handling that is part of the design. > > On Wed, Nov 19, 2025 at 7:05 PM Viraj Jasani <[email protected]> wrote: > > > Istvan, I think we should also involve dev@hbase and see what guidelines > > we are recommending so far for coprocs that would like to implement > timeout > > features for long running scans, wdyt? > Based on my current understanding, if the Scan / ScannerContext is correctly set up (allows partial rows, sets the time limit and requests a cursor), HBase will honor that and the Scan will return a heartbeat result when it times out. I THINK that's all we need. Of course if we get stuck we should ask for help. > > > > On Wed, Nov 19, 2025 at 6:51 PM Viraj Jasani <[email protected]> wrote: > > > >> Thank you for starting this thread, Istvan! > >> > >> This is an important issue. I have recently come across data correctness > >> issues with PHOENIX-7733, to be fixed by HBASE-29722. This also got me > >> thinking about the heartbeat and dummy cell overlap leading to possible > >> data correctness issues. > >> > >> > I propose dropping the dummy cell mechanics from Phoenix, and using > the > >> > HBase keepalive/cursor mechanics instead (we may not even need the > >> cursors). > >> > >> +1 > >> > >> > If we cannot find a better way to shortcut some processing in Phoenix > we > >> > may need to keep dummy cells internally, but we have to make sure that > >> they > >> > never appear on the wire and reach the client. > >> > >> I don't think it is possible for Phoenix to ensure a dummy cell never > >> reaches the HBase client. > I think if nothing else works, we can still catch and filter/convert them in RegionObserver.postScannerNext(). Of course ideally we would never generate any Dummy cells in the first place. > >> > >> > in that case we'd need > >> > to check and convert to a heartbeat scan result somehow > >> > >> This needs changes in HBase only, which I don't think HBase would > >> (should) allow. > >> > >> > Is Hbase 2/3 wire compatible enough that connecting with HBase 2.x > >> clients > >> > to Hbase 3 even a possibility ? > >> > >> Yes, wire compatibility is important. When this happens, the only thing > >> we can do is set the page timeout high enough that we never have to send > >> the dummy result to the client, or disable the paging feature. > >> > >> > >> On Thu, Nov 13, 2025 at 11:22 PM Istvan Toth <[email protected]> wrote: > >> > >>> I've been struggling with errors on the region moving tests on my HBase > >>> 3.0 > >>> WIP branch and have finally tracked the problems down to Phoenix's > dummy > >>> Cells (as well as some built-in assumptions in Phoenix which are not > true > >>> for Hbase 3, see PHOENIX-7728 > >>> <https://issues.apache.org/jira/browse/PHOENIX-7728>) > >>> > >>> HBase is not aware that these are dummy cells, and is considering the > >>> rows > >>> as already processed when retrying scans after the region goes away > from > >>> under the scan, i.e. it restarts the scan from AFTER the dummy cell's > >>> rowkey, leading to the scan skipping rows. > >>> > >>> I have been able to fix the tests by hacking Hbase to ignore these > dummy > >>> cells (and fixing the phoenix side problems described in PHOENIX-7728 > >>> <https://issues.apache.org/jira/browse/PHOENIX-7728>), but I don't > think > >>> that hacking HBase to work with dummy cells is the way to go (or even > if > >>> that would be accepted by HBase). > >>> > >>> AFAIU the dummy cells were added back in the HBase 1.x when there was > no > >>> other way to ensure timely responses from the server. > >>> > >>> HBase 2 has introduced the keepalive/cursor mechanics, which IUC serves > >>> the > >>> exact same purpose at the Phoenix dummy cells. > >>> > >>> I propose dropping the dummy cell mechanics from Phoenix, and using the > >>> HBase keepalive/cursor mechanics instead (we may not even need the > >>> cursors). > >>> > >>> If we cannot find a better way to shortcut some processing in Phoenix > we > >>> may need to keep dummy cells internally, but we have to make sure that > >>> they > >>> never appear on the wire and reach the client. (i.e. in that case we'd > >>> need > >>> to check and convert to a heartbeat scan result somehow) > >>> > >>> We will also need to consider backwards compatibility. > >>> > >>> Is Hbase 2/3 wire compatible enough that connecting with HBase 2.x > >>> clients > >>> to Hbase 3 even a possibility ? > >>> > >>> Do we want to support that ? > >>> > >>> When using Hbase 2.x, if Phoenix starts to use the HBase keepalive > >>> mechanics, will old clients work with that without changes, or do we > need > >>> to keep sending Dummy cells for older clients ? > >>> > >>> Looking forward to hearing your take, > >>> > >>> Istvan > >>> > >> > -- *István Tóth* | Sr. Staff Software Engineer *Email*: [email protected] cloudera.com <https://www.cloudera.com> [image: Cloudera] <https://www.cloudera.com/> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> ------------------------------ ------------------------------
