Thanks for these points, Kadir.

On Thu, Nov 20, 2025 at 8:59 PM Kadir Ozdemir <[email protected]>
wrote:

> The row key of the dummy result is not simply the last row that was scanned
> by RegionScannerImpl. It is computed by Phoenix coprocs based on the query.
> For example, for an ordered group by query, it should be the last row of
> the last group computed. For an unordered group by query, it nevers changes
> until the entire region is processed.  For Phoenix to be able to use the
> HBase cursor, the coprocs needs to be able to change the cursor value.
> Otherwise, there will be data integrity issues.
>

We can create a new synthetic Cursor Result and return that the same way we
create and return a new dummy Cell now.
In this regard I see no difference between the two.


>
> Another reason for the dummy result is to provide an end-to-end fair
> scheduling for Phoenix in future. Without a Phoenix level signal (the dummy
> result), the Phoenix client would not know if the server already spent the
> page time for a given query. I was thinking that we may be able to leverage
> this to decide if the current blocked thread should be released. This is a
> secondary concern but I want to make sure we all understand the
> implications of replacing this Phoenix level concept.
>

Good point.

My current understanding is that if we set the *needCursorResult* flag on
the scan,
then HBase will return all cursor results to the client, and we can use
those the same way
we use the dummy cells, so I see no problem here either.

In fact the more I look the Hbase Heartbeat/cursor implementation, the more
it feels like it was
taylor made for implementing Phoenix paging (even though it was not coming
from Phoenix developers)

The only snag I've found so far is that HBase creates the default
ScannerContext and there is no easy way
to set a custom paging time on it.


> On Wed, Nov 19, 2025 at 9:27 PM Istvan Toth <[email protected]>
> wrote:
>
> > I'm glad that you as the original designer of the feature has joined the
> > discussion, Kadir.
> >
> > On Wed, Nov 19, 2025 at 10:56 PM Kadir Ozdemir <[email protected]> wrote:
> >
> > > Istvan,
> > >
> > > When I introduced server paging and the dummy result, Phoenix did not
> > > support ScannerContext. Now that Phoenix supports ScannerContext, we
> can
> > > think about leveraging it better for server paging.
> > >
> >
> > I realize that it was necessary for HBase 1.x. This was a good design
> when
> > HBase 1
> > support was a requirement, but specifically the dummy cell implementation
> > detail
> > is redundant now that HBase 2+ has native support for the same
> > functionality.
> >
> >
> > >
> > > "HBase is not aware that these are dummy cells, and is considering the
> > rows
> > > as already processed when retrying scans after the region goes away
> from
> > > under the scan, i.e. it restarts the scan from AFTER the dummy cell's
> > > rowkey, leading to the scan skipping rows."
> > >
> >
> > This assumption is no longer true in HBase 3.
> >
> > The client side heartbeat logic in HBase 3 is thrown off by the dummy
> cells
> > generated by Phoenix.
> >
> > I had to add this hack to get some tests in Phoenix to pass:
> > https://github.com/stoty/hbase/tree/PHOENIX_DUMMY_CELL_WORKAROUND
> >
> >
> > >
> > > That is the whole purpose of the dummy result, that is, not to scan the
> > > rows that have been scanned already. This allows Phoenix to make
> progress
> > > in the presence of table region movements, otherwise every time a
> region
> > > moves or splits, Phoenix has to scan the region from the row key of the
> > > last valid result from this region instead of the last scanned row.
> What
> > is
> > > the problem with this? Consider a large region and a scan with a very
> > > selective filter such that a large number of rows need to be scanned
> > before
> > > returning a valid row. One can create a sequence of region movements
> that
> > > prevents Phoenix from making any progress for this scan
> >
> >
> > Thanks for the explanation.
> >
> > I'm not questioning the usefulness of the paging design.
> > The HBase community also agrees, so they have added this feature natively
> > in
> > HBase 2 in the form of the heartbeat/cursor feature.
> >
> > .
> > >
> > > Please note that Phoenix has some complex logic on the server side for
> > > handling various SQL language features including grouping, aggregating,
> > > sorting and joining. Implementing paging is much more complex in
> Phoenix
> > > than implementing keep alive and ScannerContext in HBase. Either you
> > > discovered an issue in Phoenix paging or a compatibility issue between
> > > HBase 2 and HBase 3. I suggest that we understand what the issue is
> first
> > > before replacing the dummy result.
> > >
> >
> > It is the latter.
> >
> > The internal heartbeat retry logic in the HBase 3 client sees the dummy
> row
> > and concludes that
> > it should continue after an error (i.e. region move) from AFTER that row.
> > (see my HBase hack above)
> >
> > This is different from the HBase 2 logic, which does not do this.
> >
> > In a way, this is related to, and sometimes casued by another Phoenix
> > change I have made for HBase 3:
> > PHOENIX-7728 <https://issues.apache.org/jira/browse/PHOENIX-7728>
> >
> >
> https://github.com/stoty/phoenix/blob/62112097bc1f050a760225663001fc0f084d4fb4/phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/GroupedAggregateRegionObserver.java#L482
> > <https://issues.apache.org/jira/browse/PHOENIX-7728>
> >
> > However, without removing the plus/minus row logic there even more tests
> > were failing, so
> > Hbase 3 doesn't work with the current Phoenix dummy row logic either.
> >
> > <https://issues.apache.org/jira/browse/PHOENIX-7728>I agree that Phoenix
> > still needs to be aware of paging, and will need logic to convert the
> > Cursor rowkeys returned from inner scanners into rowkeys that make sense
> > for the outer scanners and client, but
> > my expectation is that we can simply? convert the current Dummy cell
> logic
> > that handles this to work with the
> > cursor value instead on the server side.
> >
> >
> > >
> > >
> > >
> > > On Wed, Nov 19, 2025 at 7:23 AM Tanuj Khurana <[email protected]>
> > wrote:
> > >
> > > > Hi Istvan,
> > > >
> > > > I agree that instead of using dummy cells, we should rely on
> > > > keepalive/cursor mechanics. We have been working towards that. As
> part
> > of
> > > > PHOENIX-7707, I  propagated the scanner context all the way down to
> > > phoenix
> > > > scanners. We can leverage that.
> > > >
> > > > Tanuj
> > > >
> > > > On Wed, 19 Nov 2025 at 20:09, Istvan Toth <[email protected]
> >
> > > > wrote:
> > > >
> > > > > Thanks for your thoughtful response, Viraj.
> > > > >
> > > > > I have added my thoughts below.
> > > > >
> > > > > On Wed, Nov 19, 2025 at 2:38 PM Viraj Jasani <[email protected]>
> > > wrote:
> > > > >
> > > > > > We also need to understand: what happens when hbase client gets
> > > > heartbeat
> > > > > > and the region moves?
> > > > > >
> > > > > > I have checked that code in HBase, and the HBase client seems to
> > > handle
> > > > > this case transparently.
> > > > > We may of course find bugs, but handling that is part of the
> design.
> > > > >
> > > > >
> > > > > >
> > > > > > On Wed, Nov 19, 2025 at 7:05 PM Viraj Jasani <[email protected]
> >
> > > > wrote:
> > > > > >
> > > > > > > Istvan, I think we should also involve dev@hbase and see what
> > > > > guidelines
> > > > > > > we are recommending so far for coprocs that would like to
> > implement
> > > > > > timeout
> > > > > > > features for long running scans, wdyt?
> > > > > >
> > > > >
> > > > > Based on my current understanding, if the Scan / ScannerContext is
> > > > > correctly set up (allows partial rows, sets the time limit and
> > > requests a
> > > > > cursor),
> > > > > HBase will honor that and the Scan will return a heartbeat result
> > when
> > > it
> > > > > times out.
> > > > >
> > > > > I THINK that's all we need. Of course if we get stuck we should ask
> > for
> > > > > help.
> > > > >
> > > > >
> > > > > > >
> > > > > > > On Wed, Nov 19, 2025 at 6:51 PM Viraj Jasani <
> [email protected]
> > >
> > > > > wrote:
> > > > > > >
> > > > > > >> Thank you for starting this thread, Istvan!
> > > > > > >>
> > > > > > >> This is an important issue. I have recently come across data
> > > > > correctness
> > > > > > >> issues with PHOENIX-7733, to be fixed by HBASE-29722. This
> also
> > > got
> > > > me
> > > > > > >> thinking about the heartbeat and dummy cell overlap leading to
> > > > > possible
> > > > > > >> data correctness issues.
> > > > > > >>
> > > > > > >> > I propose dropping the dummy cell mechanics from Phoenix,
> and
> > > > using
> > > > > > the
> > > > > > >> > HBase keepalive/cursor mechanics instead (we may not even
> need
> > > the
> > > > > > >> cursors).
> > > > > > >>
> > > > > > >> +1
> > > > > > >>
> > > > > > >> > If we cannot find a better way to shortcut some processing
> in
> > > > > Phoenix
> > > > > > we
> > > > > > >> > may need to keep dummy cells internally, but we have to make
> > > sure
> > > > > that
> > > > > > >> they
> > > > > > >> > never appear on the wire and reach the client.
> > > > > > >>
> > > > > > >> I don't think it is possible for Phoenix to ensure a dummy
> cell
> > > > never
> > > > > > >> reaches the HBase client.
> > > > > >
> > > > >
> > > > > I think if nothing else works, we can still catch and
> filter/convert
> > > them
> > > > > in RegionObserver.postScannerNext().
> > > > > Of course ideally we would never generate any Dummy cells in the
> > first
> > > > > place.
> > > > >
> > > > >
> > > > > > >>
> > > > > > >> > in that case we'd need
> > > > > > >> > to check and convert to a heartbeat scan result somehow
> > > > > > >>
> > > > > > >> This needs changes in HBase only, which I don't think HBase
> > would
> > > > > > >> (should) allow.
> > > > > > >>
> > > > > > >> > Is Hbase 2/3 wire compatible enough that connecting with
> HBase
> > > 2.x
> > > > > > >> clients
> > > > > > >> > to Hbase 3 even a possibility ?
> > > > > > >>
> > > > > > >> Yes, wire compatibility is important. When this happens, the
> > only
> > > > > thing
> > > > > > >> we can do is set the page timeout high enough that we never
> have
> > > to
> > > > > send
> > > > > > >> the dummy result to the client, or disable the paging feature.
> > > > > > >>
> > > > > > >>
> > > > > > >> On Thu, Nov 13, 2025 at 11:22 PM Istvan Toth <
> [email protected]>
> > > > > wrote:
> > > > > > >>
> > > > > > >>> I've been struggling with errors on the region moving tests
> on
> > my
> > > > > HBase
> > > > > > >>> 3.0
> > > > > > >>> WIP branch and have finally tracked the problems down to
> > > Phoenix's
> > > > > > dummy
> > > > > > >>> Cells (as well as some built-in assumptions in Phoenix which
> > are
> > > > not
> > > > > > true
> > > > > > >>> for Hbase 3, see PHOENIX-7728
> > > > > > >>> <https://issues.apache.org/jira/browse/PHOENIX-7728>)
> > > > > > >>>
> > > > > > >>> HBase is not aware that these are dummy cells, and is
> > considering
> > > > the
> > > > > > >>> rows
> > > > > > >>> as already processed when retrying scans after the region
> goes
> > > away
> > > > > > from
> > > > > > >>> under the scan, i.e. it restarts the scan from AFTER the
> dummy
> > > > cell's
> > > > > > >>> rowkey, leading to the scan skipping rows.
> > > > > > >>>
> > > > > > >>> I have been able to fix the tests by hacking Hbase to ignore
> > > these
> > > > > > dummy
> > > > > > >>> cells (and fixing the phoenix side problems described in
> > > > PHOENIX-7728
> > > > > > >>> <https://issues.apache.org/jira/browse/PHOENIX-7728>), but I
> > > don't
> > > > > > think
> > > > > > >>> that hacking HBase to work with dummy cells is the way to go
> > (or
> > > > even
> > > > > > if
> > > > > > >>> that would be accepted by HBase).
> > > > > > >>>
> > > > > > >>> AFAIU the dummy cells were added back in the HBase 1.x when
> > there
> > > > was
> > > > > > no
> > > > > > >>> other way to ensure timely responses from the server.
> > > > > > >>>
> > > > > > >>> HBase 2 has introduced the keepalive/cursor mechanics, which
> > IUC
> > > > > serves
> > > > > > >>> the
> > > > > > >>> exact same purpose at the Phoenix dummy cells.
> > > > > > >>>
> > > > > > >>> I propose dropping the dummy cell mechanics from Phoenix, and
> > > using
> > > > > the
> > > > > > >>> HBase keepalive/cursor mechanics instead (we may not even
> need
> > > the
> > > > > > >>> cursors).
> > > > > > >>>
> > > > > > >>> If we cannot find a better way to shortcut some processing in
> > > > Phoenix
> > > > > > we
> > > > > > >>> may need to keep dummy cells internally, but we have to make
> > sure
> > > > > that
> > > > > > >>> they
> > > > > > >>> never appear on the wire and reach the client. (i.e. in that
> > case
> > > > > we'd
> > > > > > >>> need
> > > > > > >>> to check and convert to a heartbeat scan result somehow)
> > > > > > >>>
> > > > > > >>> We will also need to consider backwards compatibility.
> > > > > > >>>
> > > > > > >>> Is Hbase 2/3 wire compatible enough that connecting with
> HBase
> > > 2.x
> > > > > > >>> clients
> > > > > > >>> to Hbase 3 even a possibility ?
> > > > > > >>>
> > > > > > >>> Do we want to support that ?
> > > > > > >>>
> > > > > > >>> When using Hbase 2.x, if Phoenix starts to use the HBase
> > > keepalive
> > > > > > >>> mechanics, will old clients work with that without changes,
> or
> > do
> > > > we
> > > > > > need
> > > > > > >>> to keep sending Dummy cells for older clients ?
> > > > > > >>>
> > > > > > >>> Looking forward to hearing your take,
> > > > > > >>>
> > > > > > >>> Istvan
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > *István Tóth* | Sr. Staff Software Engineer
> > > > > *Email*: [email protected]
> > > > > cloudera.com <https://www.cloudera.com>
> > > > > [image: Cloudera] <https://www.cloudera.com/>
> > > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
> [image:
> > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> > > > Cloudera
> > > > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > > > > ------------------------------
> > > > > ------------------------------
> > > > >
> > > >
> > >
> >
> >
> > --
> > *István Tóth* | Sr. Staff Software Engineer
> > *Email*: [email protected]
> > cloudera.com <https://www.cloudera.com>
> > [image: Cloudera] <https://www.cloudera.com/>
> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera
> > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > ------------------------------
> > ------------------------------
> >
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: [email protected]
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------
------------------------------

Reply via email to