[
https://issues.apache.org/jira/browse/PHOENIX-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kadir Ozdemir reassigned PHOENIX-7018:
--------------------------------------
Assignee: Viraj Jasani
> Server side index maintainer caching for read, write, and replication
> ---------------------------------------------------------------------
>
> Key: PHOENIX-7018
> URL: https://issues.apache.org/jira/browse/PHOENIX-7018
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Kadir Ozdemir
> Assignee: Viraj Jasani
> Priority: Major
>
> The relationship between a data table and index is somewhat involved. Phoenix
> needs to transform a data table row to the corresponding index row, extract a
> data table row key (i.e., a primary key) from an index table row key (a
> secondary key), and map data table columns to index table included columns.
> The metadata for these operations and the operations are encapsulated in the
> class called IndexMaintainer. Phoenix creates a separate IndexMaintainer
> object for each index table in memory on the client side. IndexMaintainer
> objects are then serialized using the protobuf library and sent to servers
> along with the mutations on the data tables and scans on the index tables.
> The Phoenix server code (more accurately Phoenix coprocessors) then uses
> IndexMaintainer objects to update indexes and leverage indexes for queries.
> Phoenix coprocessors use IndexMaintainer objects associated with a given
> batch of mutations or scan operation only once (i.e., for the batch or scan)
> and the Phoenix client sends these objects along with every batch of
> mutations and every scan.
> The secondary indexes are used to improve the performance of queries on the
> secondary index columns. The secondary indexes are required to be consistent
> with their data tables. The consistency here means that regardless of whether
> a query is served from a data table or index table, the same result is
> returned. This consistency promise cannot be kept when the data and their
> index table rows are replicated independently, which happens at the HBase
> level replication.
> HBase replicates WALs (Write Ahead Logs) of regions servers and replays these
> WALs at the destination cluster. A given data table row and the corresponding
> index table row are likely served by different region servers and the WALs of
> these region servers are replicated independently. This means these rows can
> arrive at different times which makes data and index tables inconsistent at
> the destination cluster.
> Replicating global indexes leads to inefficient use of the replication
> bandwidth due to the additional overhead of replicating data that can
> actually be derived from the data that has been already replicated. When one
> considers that an index table is essentially a copy of its data table without
> the columns that are not included in the index, and a given data table can
> have multiple indexes, it is easy to see that replicating indexes can double
> the replication bandwidth requirement easily for a given data table.
> A solution for eliminating index table replication is to add just enough
> metadata to WAL records for the mutations of the data tables with indexes and
> have a replication endpoint and coprocessor endpoint to generate index
> mutations from these records, please see PHOENIX-5315. This document extends
> this solution to eliminate replicating index tables but also to improve read
> and write path for index tables by maintaining a consistent server side
> caching for index maintainers.
> The idea behind the proposed solution is to cache the index maintainers on
> the server side and thus eliminate transferring index maintainers during read
> and write as well as replication. The coprocessors that currently require the
> index maintainers are IndexRegionObserver for the write path and some other
> coprocessors including GlobalIndexChecker for read repair in the read path.
> The proposed solution leverages the existing capability of adding
> IndexMaintainer objects in the server side cache implemented by
> ServerCachingEndpointImpl. The design eliminates global index table
> replication and also eliminates the server side cache update with
> IndexMaintainer objects for each batch write.
> IndexRegionObserver (the coprocessor that generates index mutations from data
> table mutations) needs to access IndexMaintainer objects for the indexes on a
> table or view. The metadata transferred as a mutation attribute will be used
> to identify the table or view for which a mutation is. The metadata will
> include the tenant Id, table schema, and table name. The cache key for the
> array of index maintainers for this table or view will be formed from this
> metadata. When IndexRegionObserver intercepts a mutation on an HBase table
> (using the preBatchMutate coprocessor hook), IndexRegionObserver will form
> the cache key for the array of index maintainer and retrieve it from the
> server cache.
> This design requires maintaining metadata caches at region servers. These
> caches need to be consistent, that is, these caches should not have stale
> metadata. To ensure this, when MetaDataEndpointImpl updates index metadata,
> It will first invalidate the index maintainers caches. If the invalidation
> fails, then the metadata operation fails and the failure response is returned
> to the client.
> It is important to note that MetaDataEndpointImpl needs to use locking to
> serialize read and write operations on the metadata. After the caches are
> invalidated, the Phoenix coprocs would attempt to retrieve index maintainer
> objects from MetaDataEndpointImpl for the next mutation or scan operation.
> This retrieval operation has to wait for the ongoing metadata update
> transaction to complete.
> For every query including point lookup queries, the Phoenix client currently
> serializes an IndexMaintainer object, and attaches it to the scan object as a
> scan attribute, and then Phoenix coprocessors deserialize it from the scan
> object. To eliminate this serialization/deserialization and save the network
> bandwidth for IndexMaintainer, the Phoenix client can pass the cache key for
> the array of IndexMaintainer objects and the name of the index (instead of
> the IndexMaintainer object), and the coprocessor can retrieve the the array
> of IndexMaintainer objects from its server cache and identifies the one for
> the given index. If the array of IndexMaintainer objects is not in the cache,
> the coprocessor using the Phoenix client library can construct the array of
> IndexMaintainer objects and populate the server cache with it.
> A mutation or batch of mutations on a data table requires updating all the
> indexes on that data table. For the cache to be efficient, we need to have
> the IndexMaintainer objects for all of these indexes. This is the reason this
> design chooses to cache the array of IndexMaintainer objects (for a given
> table or view) instead of caching individual IndexMaintainer objects.
> This design achieves cache coherency which impacts the availability of the
> metadata operations. For a metadata operation to succeed,
> MetaDataEndpointImpl should be able to invalidate the index maintainer caches
> on the region servers first so that Phoenix coprocessors would need to
> retrieve the most recent metadata from MetaDataEndpointImpl when they need
> it. Depending on how table regions are distributed over region servers, for a
> given metadata a subset or all of the server caches may need to be
> invalidated.
> It is important to note that the general cluster availability is not impacted
> significantly as the metadata operations are rare compared to the read/write
> operations on the user data in terms of number of operations or frequency of
> operations.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)