Kadir Ozdemir created PHOENIX-7425:
--------------------------------------

             Summary: Partitioned CDC Index for eliminating salting
                 Key: PHOENIX-7425
                 URL: https://issues.apache.org/jira/browse/PHOENIX-7425
             Project: Phoenix
          Issue Type: Improvement
            Reporter: Kadir Ozdemir


The CDC future (PHOENX-7001) uses a global uncovered time 
(PHOENIX_ROW_TIMESTAM()) based index. Such an index will likely create hot 
spotting during writes. This is because the same index region will keep getting 
updated as the row key of the index table isĀ  PHOENIX_ROW_TIMESTAMP() + data 
table row key.

The same hot spotting can happen during reads as a small subset of index 
regions can be used for a given time range. For example, the most recent 
changes will be retrieved through one or two index regions.

To address these hot spotting issues, PHOENX-7001 suggests salting the index. 
There are three main issues with salting.

The first one is that the number of salt buckets is static and needs to be 
determined when the index is created.

The second is that salting does not work well with batch writes as it results 
in breaking a batch of writes into separate mini batches, one for each salt 
bucket. This leads to using more client threads and server RPC handlers, one 
for each salt bucket.

The last issue is that the salt buckets are not visible to applications and 
thus they cannot take advantage of the parallelism that comes with salting 
during reads. For example, there is no way for applications to use multiple 
threads, one thread for each salt bucket, for their queries.

To address all these issues that come with salting, this Jira introduces a 
built-in function for CDC indexes called PARTITION_ID(). PARTITION_ID() will be 
the prefix of an index row key (= PARTITION_ID() + PHOENIX_ROW_TIMESTAMP() + 
data table row key). PARTITION_ID() will identify the data table region of the 
data table row key. PARTITION_ID() can be the data table region start key and 
region timestamp. The timestamp is used to get different partition ids during 
region splits.

The information required to form the partition id is readily available for 
observer coprocessors. IndexRegionObserver can generate the value for 
PARTITION_ID() while generating index mutations.

Like PHOENIX_ROW_TIMESTAMP(), PARTITION_ID() can be used in CDC index queries.

By including PARTITION_ID() in the row key of an index table, we essentially 
create the effect of local index such that all index mutations for a given data 
table region are written to one index region determined by the PARITION_ID(). 
However, here we will not have the local index problem with region splits where 
copying index rows during data table region splits is required.

It is worthwhile to note that even if we attempt to use local index as CDC 
index, applications cannot directly query individual local index regions, which 
will be available with global indexes with PARTITION_ID(). The PARTITION_ID() 
creates a new class of global indexes that can be called partitioned global 
indexes. These will likely be the new local indexes for Phoenix.

The partitioned CDC indexes will eliminate the need for salting CDC indexes. 

Adding partition id will increase the row key size of the CDC index. This will 
not be an issue for storage footprint as the partition id will be the row key 
prefix and it will be compressed using row kew prefix encoding or block 
compression.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to