[
https://issues.apache.org/jira/browse/PHOENIX-5315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Geoffrey Jacoby updated PHOENIX-5315:
-------------------------------------
Description:
When replicating Phoenix tables using the HBase cross cluster replication
facility, it should be sufficient (and must, for correctness and avoidance of
race conditions and inconsistencies) to replicate the base table only. On the
sink cluster the replication client's application of mutations from the
replication stream to the local base table should trigger all necessary index
update operations. To the extent that won't happen now due to implementation
details, those details should be reworked.
This also has important efficiency benefits: no matter how many indexes are
defined for a base table, only the base table updates need be replicated
(presuming Phoenix schema is synchronized over all sites by some other external
means).
This would likely constitute multiple components, so we should use this issue
as an umbrella. We'd need:
# A Phoenix implementation of HBase's ReplicationEndpoint that tails the WAL
like a normal replication endpoint. However, rather than writing to HBase's
replication sink APIs (which create HBase RPCs to a remote cluster), they
should write to a new Phoenix Endpoint coprocessor.
# An HBase coprocessor Endpoint hook that takes in a request from a remote
cluster (containing both the WALEdit's data and the WALKey's annotated metadata
telling the remote cluster what tenant_id, logical tablename, and timestamp the
data is associated with). Ideally the API's message format should be
configurable, and could be either a protobuf or an Avro schema similar to the
one described by PHOENIX-5443. The endpoint hook would take the metadata + data
and regenerate a complete set of Phoenix mutations, both data and indexes, just
as the phoenix client did for the original SQL statement that generated the
source-side edits. These mutations would be written to the remote cluster by
the normal Phoenix write path.
(Unfortunately, HBase uses the term "endpoint" to mean both a replication
plugin, AND a stored-procedure-like coprocessor hook. To be clear, 1 is a
replication plugin, 2 is a coprocessor hook)
was:
When replicating Phoenix tables using the HBase cross cluster replication
facility, it should be sufficient (and must, for correctness and avoidance of
race conditions and inconsistencies) to replicate the base table only. On the
sink cluster the replication client's application of mutations from the
replication stream to the local base table should trigger all necessary index
update operations. To the extent that won't happen now due to implementation
details, those details should be reworked.
This also has important efficiency benefits: no matter how many indexes are
defined for a base table, only the base table updates need be replicated
(presuming Phoenix schema is synchronized over all sites by some other external
means).
> Cross cluster replication of the base table only should be sufficient
> ---------------------------------------------------------------------
>
> Key: PHOENIX-5315
> URL: https://issues.apache.org/jira/browse/PHOENIX-5315
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Andrew Kyle Purtell
> Priority: Major
>
> When replicating Phoenix tables using the HBase cross cluster replication
> facility, it should be sufficient (and must, for correctness and avoidance of
> race conditions and inconsistencies) to replicate the base table only. On the
> sink cluster the replication client's application of mutations from the
> replication stream to the local base table should trigger all necessary index
> update operations. To the extent that won't happen now due to implementation
> details, those details should be reworked.
> This also has important efficiency benefits: no matter how many indexes are
> defined for a base table, only the base table updates need be replicated
> (presuming Phoenix schema is synchronized over all sites by some other
> external means).
> This would likely constitute multiple components, so we should use this issue
> as an umbrella. We'd need:
> # A Phoenix implementation of HBase's ReplicationEndpoint that tails the WAL
> like a normal replication endpoint. However, rather than writing to HBase's
> replication sink APIs (which create HBase RPCs to a remote cluster), they
> should write to a new Phoenix Endpoint coprocessor.
> # An HBase coprocessor Endpoint hook that takes in a request from a remote
> cluster (containing both the WALEdit's data and the WALKey's annotated
> metadata telling the remote cluster what tenant_id, logical tablename, and
> timestamp the data is associated with). Ideally the API's message format
> should be configurable, and could be either a protobuf or an Avro schema
> similar to the one described by PHOENIX-5443. The endpoint hook would take
> the metadata + data and regenerate a complete set of Phoenix mutations, both
> data and indexes, just as the phoenix client did for the original SQL
> statement that generated the source-side edits. These mutations would be
> written to the remote cluster by the normal Phoenix write path.
> (Unfortunately, HBase uses the term "endpoint" to mean both a replication
> plugin, AND a stored-procedure-like coprocessor hook. To be clear, 1 is a
> replication plugin, 2 is a coprocessor hook)
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)