[jira] [Updated] (PHOENIX-5315) Cross cluster replication of the base table only should be sufficient

Geoffrey Jacoby (Jira) Fri, 11 Oct 2019 11:10:24 -0700


     [ 
https://issues.apache.org/jira/browse/PHOENIX-5315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Geoffrey Jacoby updated PHOENIX-5315:
-------------------------------------
    Description: 
When replicating Phoenix tables using the HBase cross cluster replication 
facility, it should be sufficient (and must, for correctness and avoidance of 
race conditions and inconsistencies) to replicate the base table only. On the 
sink cluster the replication client's application of mutations from the 
replication stream to the local base table should trigger all necessary index 
update operations. To the extent that won't happen now due to implementation 
details, those details should be reworked.

This also has important efficiency benefits: no matter how many indexes are 
defined for a base table, only the base table updates need be replicated 
(presuming Phoenix schema is synchronized over all sites by some other external 
means).

This would likely constitute multiple components, so we should use this issue 
as an umbrella. We'd need:
 # A Phoenix implementation of HBase's ReplicationEndpoint that tails the WAL 
like a normal replication endpoint. However, rather than writing to HBase's 
replication sink APIs (which create HBase RPCs to a remote cluster), they 
should write to a new Phoenix Endpoint coprocessor.
 # An HBase coprocessor Endpoint hook that takes in a request from a remote 
cluster (containing both the WALEdit's data and the WALKey's annotated metadata 
telling the remote cluster what tenant_id, logical tablename, and timestamp the 
data is associated with). Ideally the API's message format should be 
configurable, and could be either a protobuf or an Avro schema similar to the 
one described by PHOENIX-5443. The endpoint hook would take the metadata + data 
and regenerate a complete set of Phoenix mutations, both data and indexes, just 
as the phoenix client did for the original SQL statement that generated the 
source-side edits. These mutations would be written to the remote cluster by 
the normal Phoenix write path. 

(Unfortunately, HBase uses the term "endpoint" to mean both a replication 
plugin, AND a stored-procedure-like coprocessor hook. To be clear, 1 is a 
replication plugin, 2 is a coprocessor hook)

 

  was:
When replicating Phoenix tables using the HBase cross cluster replication 
facility, it should be sufficient (and must, for correctness and avoidance of 
race conditions and inconsistencies) to replicate the base table only. On the 
sink cluster the replication client's application of mutations from the 
replication stream to the local base table should trigger all necessary index 
update operations. To the extent that won't happen now due to implementation 
details, those details should be reworked. 

This also has important efficiency benefits: no matter how many indexes are 
defined for a base table, only the base table updates need be replicated 
(presuming Phoenix schema is synchronized over all sites by some other external 
means). 


> Cross cluster replication of the base table only should be sufficient
> ---------------------------------------------------------------------
>
>                 Key: PHOENIX-5315
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5315
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Andrew Kyle Purtell
>            Priority: Major
>
> When replicating Phoenix tables using the HBase cross cluster replication 
> facility, it should be sufficient (and must, for correctness and avoidance of 
> race conditions and inconsistencies) to replicate the base table only. On the 
> sink cluster the replication client's application of mutations from the 
> replication stream to the local base table should trigger all necessary index 
> update operations. To the extent that won't happen now due to implementation 
> details, those details should be reworked.
> This also has important efficiency benefits: no matter how many indexes are 
> defined for a base table, only the base table updates need be replicated 
> (presuming Phoenix schema is synchronized over all sites by some other 
> external means).
> This would likely constitute multiple components, so we should use this issue 
> as an umbrella. We'd need:
>  # A Phoenix implementation of HBase's ReplicationEndpoint that tails the WAL 
> like a normal replication endpoint. However, rather than writing to HBase's 
> replication sink APIs (which create HBase RPCs to a remote cluster), they 
> should write to a new Phoenix Endpoint coprocessor.
>  # An HBase coprocessor Endpoint hook that takes in a request from a remote 
> cluster (containing both the WALEdit's data and the WALKey's annotated 
> metadata telling the remote cluster what tenant_id, logical tablename, and 
> timestamp the data is associated with). Ideally the API's message format 
> should be configurable, and could be either a protobuf or an Avro schema 
> similar to the one described by PHOENIX-5443. The endpoint hook would take 
> the metadata + data and regenerate a complete set of Phoenix mutations, both 
> data and indexes, just as the phoenix client did for the original SQL 
> statement that generated the source-side edits. These mutations would be 
> written to the remote cluster by the normal Phoenix write path. 
> (Unfortunately, HBase uses the term "endpoint" to mean both a replication 
> plugin, AND a stored-procedure-like coprocessor hook. To be clear, 1 is a 
> replication plugin, 2 is a coprocessor hook)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (PHOENIX-5315) Cross cluster replication of the base table only should be sufficient

Reply via email to