I've been struggling with errors on the region moving tests on my HBase 3.0 WIP branch and have finally tracked the problems down to Phoenix's dummy Cells (as well as some built-in assumptions in Phoenix which are not true for Hbase 3, see PHOENIX-7728 <https://issues.apache.org/jira/browse/PHOENIX-7728>)
HBase is not aware that these are dummy cells, and is considering the rows as already processed when retrying scans after the region goes away from under the scan, i.e. it restarts the scan from AFTER the dummy cell's rowkey, leading to the scan skipping rows. I have been able to fix the tests by hacking Hbase to ignore these dummy cells (and fixing the phoenix side problems described in PHOENIX-7728 <https://issues.apache.org/jira/browse/PHOENIX-7728>), but I don't think that hacking HBase to work with dummy cells is the way to go (or even if that would be accepted by HBase). AFAIU the dummy cells were added back in the HBase 1.x when there was no other way to ensure timely responses from the server. HBase 2 has introduced the keepalive/cursor mechanics, which IUC serves the exact same purpose at the Phoenix dummy cells. I propose dropping the dummy cell mechanics from Phoenix, and using the HBase keepalive/cursor mechanics instead (we may not even need the cursors). If we cannot find a better way to shortcut some processing in Phoenix we may need to keep dummy cells internally, but we have to make sure that they never appear on the wire and reach the client. (i.e. in that case we'd need to check and convert to a heartbeat scan result somehow) We will also need to consider backwards compatibility. Is Hbase 2/3 wire compatible enough that connecting with HBase 2.x clients to Hbase 3 even a possibility ? Do we want to support that ? When using Hbase 2.x, if Phoenix starts to use the HBase keepalive mechanics, will old clients work with that without changes, or do we need to keep sending Dummy cells for older clients ? Looking forward to hearing your take, Istvan
