[
https://issues.apache.org/jira/browse/HDDS-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aravindan Vijayan updated HDDS-2241:
------------------------------------
Description:
Currently, while looking up a key, the Ozone Manager gets the pipeline
information from SCM through an RPC for every block in the key. For large files
> 1GB, we may end up making ~4 RPC calls for this. This can be optimized in a
couple of ways
* We can implement a batch getContainerWithPipeline API in SCM using which we
can get the pipeline info locations for all the blocks for a file. To keep the
number of containers passed in to SCM in a single call, we can have a fixed
container batch size on the OM side. _Here, Number of calls = 1 (or k depending
on batch size)_
* Instead, we can have a simple map (method local) for ContainerID -> Pipeline
that we get from SCM so that we don't need to make repeated calls to SCM for
the same containerID for a key. _Here, Number of calls = Number of unique
containerIDs_
was:
Currently, while looking up a key, the Ozone Manager gets the pipeline location
information from SCM through an RPC for every block in the key. For large files
> 1GB, we may end up making ~4 RPC calls for this. This can be optimized in a
couple of ways
* We can implement a batch getContainerWithPipeline API in SCM using which we
can get the pipeline info locations for all the blocks for a file. To keep the
number of containers passed in to SCM in a single call, we can have a fixed
container batch size on the OM side. _Here, Number of calls = 1 (or k depending
on batch size)_
* Instead, we can have a simple map (method local) for ContainerID -> Pipeline
that we get from SCM so that we don't need to make repeated calls to SCM for
the same containerID for a key. _Here, Number of calls = Number of unique
containerIDs_
> Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the
> pipeline for a key
> -------------------------------------------------------------------------------------------
>
> Key: HDDS-2241
> URL: https://issues.apache.org/jira/browse/HDDS-2241
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: Ozone Manager
> Reporter: Aravindan Vijayan
> Assignee: Aravindan Vijayan
> Priority: Major
>
> Currently, while looking up a key, the Ozone Manager gets the pipeline
> information from SCM through an RPC for every block in the key. For large
> files > 1GB, we may end up making ~4 RPC calls for this. This can be
> optimized in a couple of ways
> * We can implement a batch getContainerWithPipeline API in SCM using which we
> can get the pipeline info locations for all the blocks for a file. To keep
> the number of containers passed in to SCM in a single call, we can have a
> fixed container batch size on the OM side. _Here, Number of calls = 1 (or k
> depending on batch size)_
> * Instead, we can have a simple map (method local) for ContainerID ->
> Pipeline that we get from SCM so that we don't need to make repeated calls to
> SCM for the same containerID for a key. _Here, Number of calls = Number of
> unique containerIDs_
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]