huaxiangsun commented on pull request #2570: URL: https://github.com/apache/hbase/pull/2570#issuecomment-714760677
> > Do you think "fallback to primary" logic needs to be passed down from rpc retrying caller? Then it needs to be aware of this feature and needs to maintain some state. Was trying to avoid it. Yeah, this part can be optimized later. > > If you want to make a decision based on whether the previous returned location is stale, then you need to pass something down from the rpc retrying caller to the selector, as only in rpc retrying caller we can get the exception we want. Either by adding a parameter as I proposed above to not go to secondary replicas, or as your current solution, adding a stale cache in the selector and you need to pass down the exception and the related location. > My concern here is that, the current 'simple' algorithm may not work well for some cases as my mentioned above. I suggest that we remove this part of code from the patch here(just do random selecting, without other consideration) to get this patch in quickly, and then open another issue to discuss the suitable algorithm, or maybe multiple algorithms. > Thank Duo. I think I am still not 100% clear here. Let me try to explain my understanding of what you thought before moving forward. 1. My current understanding of rpc retry caller and AsyncRegionLocator. If a location is stale, rpc retry caller will detect it and pass down exception to AsyncRegionLocator(AsyncNonMetaRegionLocator), AsyncNonMetaRegionLocator updates/cleans up table cache accordingly. 2. In the next retry, if there is no table cache entry for the key, it will call locateInMeta() to fetch location from meta. In the above flow, rpc retry caller layer is the one detecting stale location, it is up to AsyncRegionLocator layer to handle all the logic. Similarly, meta selector interface tries to do similar and build the state it needs to make decision inside itself. As for different algorithm, i.e, the case you described above, a concrete example, lets there there are 5 meta replicas, if the location from meta replica 1 stale, it may try other meta replica. We can have a different implementation of selector, AdvancedSelector, which can implement such algorithm based on the state maintained in selector. I.e, the staleCache entry has info to indicate fromMetaReplicaId. Thought that the selector interface is designed for this purpose, maybe I miss something here. The initial implementation, I kept fromMetaReplica in tableCache entry. When the patch was out for review, I dropped this change, as the current simple algorithm really does not need this info. The select interface keeps fromMetaReplicaId for future enhancement so when it is needed, at least the interface does not need to be changed. I can drop fromMetaReplicaId in the current simple implementation as it is not needed anyway. Maybe you were referring to add a flag in locateInMeta() method? If the flag says going with primary only, it will bypass all these logic and goes only to primary region so it will give upper layer more control. Please share more thoughts, thanks. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
