One snag is that org.apache.hadoop.hbase.regionserver.InternalScanner does not return a Result, so we need some other mechanism to pass the Cursor internally. Options I can think of: * Keep the current Dummy cell * Create a new Cursor Cell type based on EmptyCell for the same purpose (which would be slightly faster)
On Fri, Nov 21, 2025 at 9:04 AM Istvan Toth <[email protected]> wrote: > That doesn't help us with the timeout setting. > > My plan is to remove the time field and logic from PhoenixScannerContext > and rely on the existing one in ScannerContext as the heartbeat logic uses > that one. > > The lack of setter is not a show stopper, I'm pretty sure we can set that > via reflection if all else fails. > > Istvan > > > On Fri, Nov 21, 2025 at 8:04 AM Tanuj Khurana <[email protected]> wrote: > >> Hi Istvan, >> >> As part of PHOENIX-7707, Phoenix extends the ScannerContext so that we can >> add custom fields to it. >> >> On Fri, 21 Nov 2025 at 10:53, Istvan Toth <[email protected]> wrote: >> >> > Thanks for these points, Kadir. >> > >> > On Thu, Nov 20, 2025 at 8:59 PM Kadir Ozdemir < >> > [email protected]> >> > wrote: >> > >> > > The row key of the dummy result is not simply the last row that was >> > scanned >> > > by RegionScannerImpl. It is computed by Phoenix coprocs based on the >> > query. >> > > For example, for an ordered group by query, it should be the last row >> of >> > > the last group computed. For an unordered group by query, it nevers >> > changes >> > > until the entire region is processed. For Phoenix to be able to use >> the >> > > HBase cursor, the coprocs needs to be able to change the cursor value. >> > > Otherwise, there will be data integrity issues. >> > > >> > >> > We can create a new synthetic Cursor Result and return that the same >> way we >> > create and return a new dummy Cell now. >> > In this regard I see no difference between the two. >> > >> > >> > > >> > > Another reason for the dummy result is to provide an end-to-end fair >> > > scheduling for Phoenix in future. Without a Phoenix level signal (the >> > dummy >> > > result), the Phoenix client would not know if the server already spent >> > the >> > > page time for a given query. I was thinking that we may be able to >> > leverage >> > > this to decide if the current blocked thread should be released. This >> is >> > a >> > > secondary concern but I want to make sure we all understand the >> > > implications of replacing this Phoenix level concept. >> > > >> > >> > Good point. >> > >> > My current understanding is that if we set the *needCursorResult* flag >> on >> > the scan, >> > then HBase will return all cursor results to the client, and we can use >> > those the same way >> > we use the dummy cells, so I see no problem here either. >> > >> > In fact the more I look the Hbase Heartbeat/cursor implementation, the >> more >> > it feels like it was >> > taylor made for implementing Phoenix paging (even though it was not >> coming >> > from Phoenix developers) >> > >> > The only snag I've found so far is that HBase creates the default >> > ScannerContext and there is no easy way >> > to set a custom paging time on it. >> > >> > >> > > On Wed, Nov 19, 2025 at 9:27 PM Istvan Toth >> <[email protected]> >> > > wrote: >> > > >> > > > I'm glad that you as the original designer of the feature has joined >> > the >> > > > discussion, Kadir. >> > > > >> > > > On Wed, Nov 19, 2025 at 10:56 PM Kadir Ozdemir <[email protected]> >> > wrote: >> > > > >> > > > > Istvan, >> > > > > >> > > > > When I introduced server paging and the dummy result, Phoenix did >> not >> > > > > support ScannerContext. Now that Phoenix supports ScannerContext, >> we >> > > can >> > > > > think about leveraging it better for server paging. >> > > > > >> > > > >> > > > I realize that it was necessary for HBase 1.x. This was a good >> design >> > > when >> > > > HBase 1 >> > > > support was a requirement, but specifically the dummy cell >> > implementation >> > > > detail >> > > > is redundant now that HBase 2+ has native support for the same >> > > > functionality. >> > > > >> > > > >> > > > > >> > > > > "HBase is not aware that these are dummy cells, and is considering >> > the >> > > > rows >> > > > > as already processed when retrying scans after the region goes >> away >> > > from >> > > > > under the scan, i.e. it restarts the scan from AFTER the dummy >> cell's >> > > > > rowkey, leading to the scan skipping rows." >> > > > > >> > > > >> > > > This assumption is no longer true in HBase 3. >> > > > >> > > > The client side heartbeat logic in HBase 3 is thrown off by the >> dummy >> > > cells >> > > > generated by Phoenix. >> > > > >> > > > I had to add this hack to get some tests in Phoenix to pass: >> > > > https://github.com/stoty/hbase/tree/PHOENIX_DUMMY_CELL_WORKAROUND >> > > > >> > > > >> > > > > >> > > > > That is the whole purpose of the dummy result, that is, not to >> scan >> > the >> > > > > rows that have been scanned already. This allows Phoenix to make >> > > progress >> > > > > in the presence of table region movements, otherwise every time a >> > > region >> > > > > moves or splits, Phoenix has to scan the region from the row key >> of >> > the >> > > > > last valid result from this region instead of the last scanned >> row. >> > > What >> > > > is >> > > > > the problem with this? Consider a large region and a scan with a >> very >> > > > > selective filter such that a large number of rows need to be >> scanned >> > > > before >> > > > > returning a valid row. One can create a sequence of region >> movements >> > > that >> > > > > prevents Phoenix from making any progress for this scan >> > > > >> > > > >> > > > Thanks for the explanation. >> > > > >> > > > I'm not questioning the usefulness of the paging design. >> > > > The HBase community also agrees, so they have added this feature >> > natively >> > > > in >> > > > HBase 2 in the form of the heartbeat/cursor feature. >> > > > >> > > > . >> > > > > >> > > > > Please note that Phoenix has some complex logic on the server side >> > for >> > > > > handling various SQL language features including grouping, >> > aggregating, >> > > > > sorting and joining. Implementing paging is much more complex in >> > > Phoenix >> > > > > than implementing keep alive and ScannerContext in HBase. Either >> you >> > > > > discovered an issue in Phoenix paging or a compatibility issue >> > between >> > > > > HBase 2 and HBase 3. I suggest that we understand what the issue >> is >> > > first >> > > > > before replacing the dummy result. >> > > > > >> > > > >> > > > It is the latter. >> > > > >> > > > The internal heartbeat retry logic in the HBase 3 client sees the >> dummy >> > > row >> > > > and concludes that >> > > > it should continue after an error (i.e. region move) from AFTER that >> > row. >> > > > (see my HBase hack above) >> > > > >> > > > This is different from the HBase 2 logic, which does not do this. >> > > > >> > > > In a way, this is related to, and sometimes casued by another >> Phoenix >> > > > change I have made for HBase 3: >> > > > PHOENIX-7728 <https://issues.apache.org/jira/browse/PHOENIX-7728> >> > > > >> > > > >> > > >> > >> https://github.com/stoty/phoenix/blob/62112097bc1f050a760225663001fc0f084d4fb4/phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/GroupedAggregateRegionObserver.java#L482 >> > > > <https://issues.apache.org/jira/browse/PHOENIX-7728> >> > > > >> > > > However, without removing the plus/minus row logic there even more >> > tests >> > > > were failing, so >> > > > Hbase 3 doesn't work with the current Phoenix dummy row logic >> either. >> > > > >> > > > <https://issues.apache.org/jira/browse/PHOENIX-7728>I agree that >> > Phoenix >> > > > still needs to be aware of paging, and will need logic to convert >> the >> > > > Cursor rowkeys returned from inner scanners into rowkeys that make >> > sense >> > > > for the outer scanners and client, but >> > > > my expectation is that we can simply? convert the current Dummy cell >> > > logic >> > > > that handles this to work with the >> > > > cursor value instead on the server side. >> > > > >> > > > >> > > > > >> > > > > >> > > > > >> > > > > On Wed, Nov 19, 2025 at 7:23 AM Tanuj Khurana < >> [email protected]> >> > > > wrote: >> > > > > >> > > > > > Hi Istvan, >> > > > > > >> > > > > > I agree that instead of using dummy cells, we should rely on >> > > > > > keepalive/cursor mechanics. We have been working towards that. >> As >> > > part >> > > > of >> > > > > > PHOENIX-7707, I propagated the scanner context all the way >> down to >> > > > > phoenix >> > > > > > scanners. We can leverage that. >> > > > > > >> > > > > > Tanuj >> > > > > > >> > > > > > On Wed, 19 Nov 2025 at 20:09, Istvan Toth >> > <[email protected] >> > > > >> > > > > > wrote: >> > > > > > >> > > > > > > Thanks for your thoughtful response, Viraj. >> > > > > > > >> > > > > > > I have added my thoughts below. >> > > > > > > >> > > > > > > On Wed, Nov 19, 2025 at 2:38 PM Viraj Jasani < >> [email protected] >> > > >> > > > > wrote: >> > > > > > > >> > > > > > > > We also need to understand: what happens when hbase client >> gets >> > > > > > heartbeat >> > > > > > > > and the region moves? >> > > > > > > > >> > > > > > > > I have checked that code in HBase, and the HBase client >> seems >> > to >> > > > > handle >> > > > > > > this case transparently. >> > > > > > > We may of course find bugs, but handling that is part of the >> > > design. >> > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > > On Wed, Nov 19, 2025 at 7:05 PM Viraj Jasani < >> > [email protected] >> > > > >> > > > > > wrote: >> > > > > > > > >> > > > > > > > > Istvan, I think we should also involve dev@hbase and see >> > what >> > > > > > > guidelines >> > > > > > > > > we are recommending so far for coprocs that would like to >> > > > implement >> > > > > > > > timeout >> > > > > > > > > features for long running scans, wdyt? >> > > > > > > > >> > > > > > > >> > > > > > > Based on my current understanding, if the Scan / >> ScannerContext >> > is >> > > > > > > correctly set up (allows partial rows, sets the time limit and >> > > > > requests a >> > > > > > > cursor), >> > > > > > > HBase will honor that and the Scan will return a heartbeat >> result >> > > > when >> > > > > it >> > > > > > > times out. >> > > > > > > >> > > > > > > I THINK that's all we need. Of course if we get stuck we >> should >> > ask >> > > > for >> > > > > > > help. >> > > > > > > >> > > > > > > >> > > > > > > > > >> > > > > > > > > On Wed, Nov 19, 2025 at 6:51 PM Viraj Jasani < >> > > [email protected] >> > > > > >> > > > > > > wrote: >> > > > > > > > > >> > > > > > > > >> Thank you for starting this thread, Istvan! >> > > > > > > > >> >> > > > > > > > >> This is an important issue. I have recently come across >> data >> > > > > > > correctness >> > > > > > > > >> issues with PHOENIX-7733, to be fixed by HBASE-29722. >> This >> > > also >> > > > > got >> > > > > > me >> > > > > > > > >> thinking about the heartbeat and dummy cell overlap >> leading >> > to >> > > > > > > possible >> > > > > > > > >> data correctness issues. >> > > > > > > > >> >> > > > > > > > >> > I propose dropping the dummy cell mechanics from >> Phoenix, >> > > and >> > > > > > using >> > > > > > > > the >> > > > > > > > >> > HBase keepalive/cursor mechanics instead (we may not >> even >> > > need >> > > > > the >> > > > > > > > >> cursors). >> > > > > > > > >> >> > > > > > > > >> +1 >> > > > > > > > >> >> > > > > > > > >> > If we cannot find a better way to shortcut some >> processing >> > > in >> > > > > > > Phoenix >> > > > > > > > we >> > > > > > > > >> > may need to keep dummy cells internally, but we have to >> > make >> > > > > sure >> > > > > > > that >> > > > > > > > >> they >> > > > > > > > >> > never appear on the wire and reach the client. >> > > > > > > > >> >> > > > > > > > >> I don't think it is possible for Phoenix to ensure a >> dummy >> > > cell >> > > > > > never >> > > > > > > > >> reaches the HBase client. >> > > > > > > > >> > > > > > > >> > > > > > > I think if nothing else works, we can still catch and >> > > filter/convert >> > > > > them >> > > > > > > in RegionObserver.postScannerNext(). >> > > > > > > Of course ideally we would never generate any Dummy cells in >> the >> > > > first >> > > > > > > place. >> > > > > > > >> > > > > > > >> > > > > > > > >> >> > > > > > > > >> > in that case we'd need >> > > > > > > > >> > to check and convert to a heartbeat scan result somehow >> > > > > > > > >> >> > > > > > > > >> This needs changes in HBase only, which I don't think >> HBase >> > > > would >> > > > > > > > >> (should) allow. >> > > > > > > > >> >> > > > > > > > >> > Is Hbase 2/3 wire compatible enough that connecting >> with >> > > HBase >> > > > > 2.x >> > > > > > > > >> clients >> > > > > > > > >> > to Hbase 3 even a possibility ? >> > > > > > > > >> >> > > > > > > > >> Yes, wire compatibility is important. When this happens, >> the >> > > > only >> > > > > > > thing >> > > > > > > > >> we can do is set the page timeout high enough that we >> never >> > > have >> > > > > to >> > > > > > > send >> > > > > > > > >> the dummy result to the client, or disable the paging >> > feature. >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> On Thu, Nov 13, 2025 at 11:22 PM Istvan Toth < >> > > [email protected]> >> > > > > > > wrote: >> > > > > > > > >> >> > > > > > > > >>> I've been struggling with errors on the region moving >> tests >> > > on >> > > > my >> > > > > > > HBase >> > > > > > > > >>> 3.0 >> > > > > > > > >>> WIP branch and have finally tracked the problems down to >> > > > > Phoenix's >> > > > > > > > dummy >> > > > > > > > >>> Cells (as well as some built-in assumptions in Phoenix >> > which >> > > > are >> > > > > > not >> > > > > > > > true >> > > > > > > > >>> for Hbase 3, see PHOENIX-7728 >> > > > > > > > >>> <https://issues.apache.org/jira/browse/PHOENIX-7728>) >> > > > > > > > >>> >> > > > > > > > >>> HBase is not aware that these are dummy cells, and is >> > > > considering >> > > > > > the >> > > > > > > > >>> rows >> > > > > > > > >>> as already processed when retrying scans after the >> region >> > > goes >> > > > > away >> > > > > > > > from >> > > > > > > > >>> under the scan, i.e. it restarts the scan from AFTER the >> > > dummy >> > > > > > cell's >> > > > > > > > >>> rowkey, leading to the scan skipping rows. >> > > > > > > > >>> >> > > > > > > > >>> I have been able to fix the tests by hacking Hbase to >> > ignore >> > > > > these >> > > > > > > > dummy >> > > > > > > > >>> cells (and fixing the phoenix side problems described in >> > > > > > PHOENIX-7728 >> > > > > > > > >>> <https://issues.apache.org/jira/browse/PHOENIX-7728>), >> > but I >> > > > > don't >> > > > > > > > think >> > > > > > > > >>> that hacking HBase to work with dummy cells is the way >> to >> > go >> > > > (or >> > > > > > even >> > > > > > > > if >> > > > > > > > >>> that would be accepted by HBase). >> > > > > > > > >>> >> > > > > > > > >>> AFAIU the dummy cells were added back in the HBase 1.x >> when >> > > > there >> > > > > > was >> > > > > > > > no >> > > > > > > > >>> other way to ensure timely responses from the server. >> > > > > > > > >>> >> > > > > > > > >>> HBase 2 has introduced the keepalive/cursor mechanics, >> > which >> > > > IUC >> > > > > > > serves >> > > > > > > > >>> the >> > > > > > > > >>> exact same purpose at the Phoenix dummy cells. >> > > > > > > > >>> >> > > > > > > > >>> I propose dropping the dummy cell mechanics from >> Phoenix, >> > and >> > > > > using >> > > > > > > the >> > > > > > > > >>> HBase keepalive/cursor mechanics instead (we may not >> even >> > > need >> > > > > the >> > > > > > > > >>> cursors). >> > > > > > > > >>> >> > > > > > > > >>> If we cannot find a better way to shortcut some >> processing >> > in >> > > > > > Phoenix >> > > > > > > > we >> > > > > > > > >>> may need to keep dummy cells internally, but we have to >> > make >> > > > sure >> > > > > > > that >> > > > > > > > >>> they >> > > > > > > > >>> never appear on the wire and reach the client. (i.e. in >> > that >> > > > case >> > > > > > > we'd >> > > > > > > > >>> need >> > > > > > > > >>> to check and convert to a heartbeat scan result somehow) >> > > > > > > > >>> >> > > > > > > > >>> We will also need to consider backwards compatibility. >> > > > > > > > >>> >> > > > > > > > >>> Is Hbase 2/3 wire compatible enough that connecting with >> > > HBase >> > > > > 2.x >> > > > > > > > >>> clients >> > > > > > > > >>> to Hbase 3 even a possibility ? >> > > > > > > > >>> >> > > > > > > > >>> Do we want to support that ? >> > > > > > > > >>> >> > > > > > > > >>> When using Hbase 2.x, if Phoenix starts to use the HBase >> > > > > keepalive >> > > > > > > > >>> mechanics, will old clients work with that without >> changes, >> > > or >> > > > do >> > > > > > we >> > > > > > > > need >> > > > > > > > >>> to keep sending Dummy cells for older clients ? >> > > > > > > > >>> >> > > > > > > > >>> Looking forward to hearing your take, >> > > > > > > > >>> >> > > > > > > > >>> Istvan >> > > > > > > > >>> >> > > > > > > > >> >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > -- >> > > > > > > *István Tóth* | Sr. Staff Software Engineer >> > > > > > > *Email*: [email protected] >> > > > > > > cloudera.com <https://www.cloudera.com> >> > > > > > > [image: Cloudera] <https://www.cloudera.com/> >> > > > > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> >> > > [image: >> > > > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> >> > [image: >> > > > > > Cloudera >> > > > > > > on LinkedIn] <https://www.linkedin.com/company/cloudera> >> > > > > > > ------------------------------ >> > > > > > > ------------------------------ >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > >> > > > -- >> > > > *István Tóth* | Sr. Staff Software Engineer >> > > > *Email*: [email protected] >> > > > cloudera.com <https://www.cloudera.com> >> > > > [image: Cloudera] <https://www.cloudera.com/> >> > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: >> > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >> > > Cloudera >> > > > on LinkedIn] <https://www.linkedin.com/company/cloudera> >> > > > ------------------------------ >> > > > ------------------------------ >> > > > >> > > >> > >> > >> > -- >> > *István Tóth* | Sr. Staff Software Engineer >> > *Email*: [email protected] >> > cloudera.com <https://www.cloudera.com> >> > [image: Cloudera] <https://www.cloudera.com/> >> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: >> > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >> Cloudera >> > on LinkedIn] <https://www.linkedin.com/company/cloudera> >> > ------------------------------ >> > ------------------------------ >> > >> > > > -- > *István Tóth* | Sr. Staff Software Engineer > *Email*: [email protected] > cloudera.com <https://www.cloudera.com> > [image: Cloudera] <https://www.cloudera.com/> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: > Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> > ------------------------------ > ------------------------------ > -- *István Tóth* | Sr. Staff Software Engineer *Email*: [email protected] cloudera.com <https://www.cloudera.com> [image: Cloudera] <https://www.cloudera.com/> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> ------------------------------ ------------------------------
