[
https://issues.apache.org/jira/browse/PHOENIX-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tanuj Khurana updated PHOENIX-7727:
-----------------------------------
Description:
ServerCachingEndpointImpl coproc implements the server cache RPC protocol. One
use case of the cache RPCs is server side index updates. Whenever the client
commits a batch of mutations, if the mutation count is greater than
_*phoenix.index.mutableBatchSizeThreshold*_ (default value 3) instead of
sending the index maintainer metadata as a mutation attribute the client uses
the server cache RPCs to populate the server cache on the region servers and
just sends the cache key as a mutation attribute. This was done as an
optimization to reduce sending duplicate index maintainer information on every
mutation of the batch. It is typical to have batches of size 100 - 1000 so this
optimization is useful but there are several downsides of this rpc approach.
# In-order to determine which region servers to send the cache RPCs, we first
create the scan ranges object from the primary keys in the mutations. The size
of the scan ranges object is the same as your commit size. This can add to GC
overhead since we are doing this on every commit batch.
# Then the client calls _*getAllTableRegions*_ which can make calls to meta if
the table region locations are not cached in the hbase client meta cache. This
adds additional latency on the client side. Once it receives the region list,
it intersects the region boundaries with the scan range it constructed to
determine the locations of the region servers which host the regions that will
be receiving the mutations.
# Then the actual RPCs are executed in parallel but these caching RPCs are
subject to standard hbase client retry policies and can be retried in case of
timeouts or RITs thus potentially adding more latency overhead.
# Futhermore, it is not guaranteed that when the server processes these
mutations in the IndexRegionObserver coproc and tries to fetch the index
maintainer metadata from the cache it will definitely find the cache entry.
This happens when the region moves/splits after sending the cache RPC but
before the data table mutations are sent. Another scenario where this happens
is if the server is overloaded and RPCs are getting queued on the server and by
the time the server process the batch rpc, the cache entry has expired (default
TTL 30s). If the metadata is not found a DoNotRetryIOException is returned to
the client which is handled within the Phoenix MutationState class. The phoenix
client retries and again repeats. The worst thing is that when the Phoenix
client receives this error, it first calls _*clearTableRegionCache*_ before
repeating.
Sample error logs that we have seen in production:
{code:java}
2025-10-20 07:38:21,800 INFO
[t.FPRWQ.Fifo.write.handler=120,queue=20,port=60020] util.IndexManagementUtil -
Rethrowing
org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2008 (INT10): ERROR 2008
(INT10): Unable to find cached index metadata. key=4619765145502425070
region=FOO.TEST1,00D1H000000N1TASDER,1708858336233.1ae49454ee9993697a7cc9e34c899b25.host=server.net,60020,1757812136389
Index update failed
at org.apache.phoenix.util.ClientUtil.createIOException(ClientUtil.java:166)
at org.apache.phoenix.util.ClientUtil.throwIOException(ClientUtil.java:182)
at
org.apache.phoenix.index.PhoenixIndexMetaDataBuilder.getIndexMetaDataCache(PhoenixIndexMetaDataBuilder.java:101)
at
org.apache.phoenix.index.PhoenixIndexMetaDataBuilder.getIndexMetaData(PhoenixIndexMetaDataBuilder.java:51)
at
org.apache.phoenix.index.PhoenixIndexBuilder.getIndexMetaData(PhoenixIndexBuilder.java:92)
at
org.apache.phoenix.index.PhoenixIndexBuilder.getIndexMetaData(PhoenixIndexBuilder.java:69)
at
org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexMetaData(IndexBuildManager.java:85)
at
org.apache.phoenix.hbase.index.IndexRegionObserver.getPhoenixIndexMetaData(IndexRegionObserver.java:1090)
at
org.apache.phoenix.hbase.index.IndexRegionObserver.preBatchMutateWithExceptions(IndexRegionObserver.java:1214)
at
org.apache.phoenix.hbase.index.IndexRegionObserver.preBatchMutate(IndexRegionObserver.java:514)
at {code}
There is a better solution which will address most of the above problems.
Previously, the IndexRegionObserver coproc didn't have the logical name of the
table when it was processing a batch of mutations so it couldn't tell whether
the entity into which data is being upserted is a table or a view. Because of
this the server couldn't determine if the entity in question has an index or
not so it relied on the client to tell the server by annotating the mutations.
But PHOENIX-5521 started annotating each mutation with enough metadata so that
the server can deterministically figure out the Phoenix schema object the
mutation targets to. With this information the server can simply _*getTable()*_
and rely on the cqsi cache. Depending on the UPDATE_CACHE_FREQUENCY set on the
table we can control the schema freshness. There are already other places on
the server where we are making getTable calls like in compaction, server
metadata caching
This will greatly simplify the implementation and should also improve batch
write times on tables with indexes.
was:
ServerCachingEndpointImpl coproc implements the server cache RPC protocol. One
use case of the cache RPCs is server side index updates. Whenever the client
commits a batch of mutations, if the mutation count is greater than
_*phoenix.index.mutableBatchSizeThreshold*_ (default value 3) instead of
sending the index maintainer metadata as a mutation attribute the client uses
the server cache RPCs to populate the server cache on the region servers and
just sends the cache key as a mutation attribute. This was done as an
optimization to reduce sending duplicate index maintainer information on every
mutation of the batch. It is typical to have batches of size 100 - 1000 so this
optimization is useful but there are several downsides of this rpc approach.
# In-order to determine which region servers to send the cache RPCs, we first
create the scan ranges object from the primary keys in the mutations. The size
of the scan ranges object is the same as your commit size. This can add to GC
overhead since we are doing this on every commit batch.
# Then the client calls _*getAllTableRegions*_ which can make calls to meta if
the table region locations are not cached in the hbase client meta cache. This
adds additional latency on the client side. Once it receives the region list,
it intersects the region boundaries with the scan range it constructed to
determine the locations of the region servers which host the regions that will
be receiving the mutations.
# Then the actual RPCs are executed in parallel but these caching RPCs are
subject to standard hbase client retry policies and can be retried in case of
timeouts or RITs thus potentially adding more latency overhead.
# Futhermore, it is not guaranteed that when the server processes these
mutations in the IndexRegionObserver coproc and tries to fetch the index
maintainer metadata from the cache it will definitely find the cache entry.
This happens when the region moves/splits after sending the cache RPC but
before the data table mutations are sent. Another scenario where this happens
is if the server is overloaded and RPCs are getting queued on the server and by
the time the server process the batch rpc, the cache entry has expired (default
TTL 30s). If the metadata is not found a DoNotRetryIOException is returned to
the client which is handled within the Phoenix MutationState class. The phoenix
client retries and again repeats. The worst thing is that when the Phoenix
client receives this error, it first calls _*clearTableRegionCache*_ before
repeating.
Sample error logs that we have seen in production:
{code:java}
2025-10-20 07:38:21,800 INFO
[t.FPRWQ.Fifo.write.handler=120,queue=20,port=60020] util.IndexManagementUtil -
Rethrowing
org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2008 (INT10): ERROR 2008
(INT10): Unable to find cached index metadata. key=4619765145502425070
region=FOO.TEST1,00D1H000000N1TASDER,1708858336233.1ae49454ee9993697a7cc9e34c899b25.host=server.net,60020,1757812136389
Index update failed
at org.apache.phoenix.util.ClientUtil.createIOException(ClientUtil.java:166)
at org.apache.phoenix.util.ClientUtil.throwIOException(ClientUtil.java:182)
at
org.apache.phoenix.index.PhoenixIndexMetaDataBuilder.getIndexMetaDataCache(PhoenixIndexMetaDataBuilder.java:101)
at
org.apache.phoenix.index.PhoenixIndexMetaDataBuilder.getIndexMetaData(PhoenixIndexMetaDataBuilder.java:51)
at
org.apache.phoenix.index.PhoenixIndexBuilder.getIndexMetaData(PhoenixIndexBuilder.java:92)
at
org.apache.phoenix.index.PhoenixIndexBuilder.getIndexMetaData(PhoenixIndexBuilder.java:69)
at
org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexMetaData(IndexBuildManager.java:85)
at
org.apache.phoenix.hbase.index.IndexRegionObserver.getPhoenixIndexMetaData(IndexRegionObserver.java:1090)
at
org.apache.phoenix.hbase.index.IndexRegionObserver.preBatchMutateWithExceptions(IndexRegionObserver.java:1214)
at
org.apache.phoenix.hbase.index.IndexRegionObserver.preBatchMutate(IndexRegionObserver.java:514)
at {code}
There is a better solution which will address most of the above problems.
Previously, the IndexRegionObserver coproc didn't have the logical name of the
table when it was processing a batch of mutations so it couldn't tell whether
the entity into which data is being upserted is a table or a view. Because of
this the server couldn't determine if the entity in question has an index or
not so it relied on the client to tell the server by annotating the mutations.
But PHOENIX-5521 started annotating each mutation with enough metadata so that
the server can deterministically figure out the Phoenix schema object the
mutation targets to. With this information the server can simply _*getTable()*_
and rely on the cqsi cache. Depending on the UPDATE_CACHE_FREQUENCY set on the
table we can control the schema freshness. There are already other places on
the server where we are making getTable calls like in compaction, server
metadata caching
This will greatly simplify the implementation and should also improve batch
write times on tables with indexes.
> Eliminate IndexMetadataCache rpcs and use server side cqsi PTable cache for
> index maintainer metadata
> -----------------------------------------------------------------------------------------------------
>
> Key: PHOENIX-7727
> URL: https://issues.apache.org/jira/browse/PHOENIX-7727
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Tanuj Khurana
> Assignee: Tanuj Khurana
> Priority: Major
>
> ServerCachingEndpointImpl coproc implements the server cache RPC protocol.
> One use case of the cache RPCs is server side index updates. Whenever the
> client commits a batch of mutations, if the mutation count is greater than
> _*phoenix.index.mutableBatchSizeThreshold*_ (default value 3) instead of
> sending the index maintainer metadata as a mutation attribute the client uses
> the server cache RPCs to populate the server cache on the region servers and
> just sends the cache key as a mutation attribute. This was done as an
> optimization to reduce sending duplicate index maintainer information on
> every mutation of the batch. It is typical to have batches of size 100 - 1000
> so this optimization is useful but there are several downsides of this rpc
> approach.
> # In-order to determine which region servers to send the cache RPCs, we
> first create the scan ranges object from the primary keys in the mutations.
> The size of the scan ranges object is the same as your commit size. This can
> add to GC overhead since we are doing this on every commit batch.
> # Then the client calls _*getAllTableRegions*_ which can make calls to meta
> if the table region locations are not cached in the hbase client meta cache.
> This adds additional latency on the client side. Once it receives the region
> list, it intersects the region boundaries with the scan range it constructed
> to determine the locations of the region servers which host the regions that
> will be receiving the mutations.
> # Then the actual RPCs are executed in parallel but these caching RPCs are
> subject to standard hbase client retry policies and can be retried in case of
> timeouts or RITs thus potentially adding more latency overhead.
> # Futhermore, it is not guaranteed that when the server processes these
> mutations in the IndexRegionObserver coproc and tries to fetch the index
> maintainer metadata from the cache it will definitely find the cache entry.
> This happens when the region moves/splits after sending the cache RPC but
> before the data table mutations are sent. Another scenario where this happens
> is if the server is overloaded and RPCs are getting queued on the server and
> by the time the server process the batch rpc, the cache entry has expired
> (default TTL 30s). If the metadata is not found a DoNotRetryIOException is
> returned to the client which is handled within the Phoenix MutationState
> class. The phoenix client retries and again repeats. The worst thing is that
> when the Phoenix client receives this error, it first calls
> _*clearTableRegionCache*_ before repeating.
>
> Sample error logs that we have seen in production:
> {code:java}
> 2025-10-20 07:38:21,800 INFO
> [t.FPRWQ.Fifo.write.handler=120,queue=20,port=60020] util.IndexManagementUtil
> - Rethrowing
> org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2008 (INT10): ERROR 2008
> (INT10): Unable to find cached index metadata. key=4619765145502425070
> region=FOO.TEST1,00D1H000000N1TASDER,1708858336233.1ae49454ee9993697a7cc9e34c899b25.host=server.net,60020,1757812136389
> Index update failed
> at
> org.apache.phoenix.util.ClientUtil.createIOException(ClientUtil.java:166)
> at
> org.apache.phoenix.util.ClientUtil.throwIOException(ClientUtil.java:182)
> at
> org.apache.phoenix.index.PhoenixIndexMetaDataBuilder.getIndexMetaDataCache(PhoenixIndexMetaDataBuilder.java:101)
> at
> org.apache.phoenix.index.PhoenixIndexMetaDataBuilder.getIndexMetaData(PhoenixIndexMetaDataBuilder.java:51)
> at
> org.apache.phoenix.index.PhoenixIndexBuilder.getIndexMetaData(PhoenixIndexBuilder.java:92)
> at
> org.apache.phoenix.index.PhoenixIndexBuilder.getIndexMetaData(PhoenixIndexBuilder.java:69)
> at
> org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexMetaData(IndexBuildManager.java:85)
> at
> org.apache.phoenix.hbase.index.IndexRegionObserver.getPhoenixIndexMetaData(IndexRegionObserver.java:1090)
> at
> org.apache.phoenix.hbase.index.IndexRegionObserver.preBatchMutateWithExceptions(IndexRegionObserver.java:1214)
> at
> org.apache.phoenix.hbase.index.IndexRegionObserver.preBatchMutate(IndexRegionObserver.java:514)
> at {code}
> There is a better solution which will address most of the above problems.
> Previously, the IndexRegionObserver coproc didn't have the logical name of
> the table when it was processing a batch of mutations so it couldn't tell
> whether the entity into which data is being upserted is a table or a view.
> Because of this the server couldn't determine if the entity in question has
> an index or not so it relied on the client to tell the server by annotating
> the mutations. But PHOENIX-5521 started annotating each mutation with enough
> metadata so that the server can deterministically figure out the Phoenix
> schema object the mutation targets to. With this information the server can
> simply _*getTable()*_ and rely on the cqsi cache. Depending on the
> UPDATE_CACHE_FREQUENCY set on the table we can control the schema freshness.
> There are already other places on the server where we are making getTable
> calls like in compaction, server metadata caching
> This will greatly simplify the implementation and should also improve batch
> write times on tables with indexes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)