duongkame commented on PR #4346: URL: https://github.com/apache/ozone/pull/4346#issuecomment-1458843989
> We need to check if the client is caching or exposing the pipeline-refreshed results in any way. One alternative without breaking behavior would be adding a lighter-weight list keys API (something discussed in the context of `ls` command and list bucket API context). In the `ozone-client`, I don't see any code using block location out of a `listStatus` result. lock location info is used in `RpcClient` `getKey` or `getFile` API that read key metadata individually from OM and creates `OzoneInputStream` that allows reading data from datanodes given the block location information. `OzoneInputStream` will retry reading individual key metadata if it finds the metatadata doesn't work. OFS `listStatus` convert the files information and block location to a structure of `org.apache.hadoop.fs.LocatedFileStatus`. I'm not sure how this is used by external dependency. But I guess any data read needs to be done by `OzoneInputStream` eventually. There's one problem left, newer clients will have `OzoneInputStream` retry getting block location with `getKeyInfo(cacheRefresh=true)` and that will force the container location cache to refresh in OM. Yet, older clients will call `lookupKey` which calls SCM directly to grab container location. There will be a recurring performance degradation for older clients if OM caches an outdated container location. But yet, I don't see a usage of block information from the result of listStatus yet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
