[
https://issues.apache.org/jira/browse/HDDS-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
UENISHI Kota updated HDDS-6321:
-------------------------------
Description:
In every ACL check under native Ozone authorizer, it calls
[keyManager.checkAccess|#L162]. KeyManagerImpl#checkAccess [calls
getFileStatus() as
well|https://github.com/apache/ozone/blob/76aa27e7c05196ae00cba540efce4bb7529e5d15/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java#L1804],
which finally [calls pipeline refresh()|#L2011]. Pipeline refresh is not
needed here because it just obtains key ACL and no need for blocks. This causes
additional external RPC call to SCM, which is unnecessary overhead on each
object-get.
We observed this issue in our production cluster, as 50% increase of latency
estimated from wall clock profile:
!Screenshot_2022-02-15_17-35-18.png|width=739,height=452!
Also, our monitoring shows 2x lookup key to OM, which increases SCM call count
of GetContainerWithPipeline.
!29843180-8924-11ec-8ad5-5b5a8342f2d3.png|width=797,height=245!
!2b4df500-8924-11ec-927a-de3d8adc6fe0.png|width=798,height=239!
I'm not sure how to fix this issue regarding {color:#6e7781}HDDS-3658{color} .
Cleanest way would be re-utilizing again refreshPipeline flag, but it'd be a
hustle to consider all cases using getFileStatus(). HDDS-5450 may be give us
some hints.
was:
In every ACL check under native Ozone authorizer, it calls
[keyManager.checkAccess|#L162].] KeyManagerImpl#checkAccess [calls
getFileStatus() as
well|https://github.com/apache/ozone/blob/76aa27e7c05196ae00cba540efce4bb7529e5d15/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java#L1804],
which finally [calls pipeline refresh()|#L2011].] Pipeline refresh is not
needed here because it just obtains key ACL and no need for blocks. This causes
additional external RPC call to SCM, which is unnecessary overhead on each
object-get.
We observed this issue in our production cluster, as 50% increase of latency
estimated from wall clock profile:
!Screenshot_2022-02-15_17-35-18.png!
Also, our monitoring shows 2x lookup key to OM, which increases SCM call count
of GetContainerWithPipeline.
!29843180-8924-11ec-8ad5-5b5a8342f2d3.png!
!2b4df500-8924-11ec-927a-de3d8adc6fe0.png!
I'm not sure how to fix this issue regarding {color:#6e7781}HDDS-3658{color} .
Cleanest way would be re-utilizing again refreshPipeline flag, but it'd be a
hustle to consider all cases using getFileStatus(). HDDS-5450 may be give us
some hints.
> Avoid refresh pipeline for key lookup in checkAcls
> --------------------------------------------------
>
> Key: HDDS-6321
> URL: https://issues.apache.org/jira/browse/HDDS-6321
> Project: Apache Ozone
> Issue Type: Bug
> Components: Ozone Manager
> Affects Versions: 1.2.0
> Environment: OM setup with Native Ozone Authorizer
> Reporter: UENISHI Kota
> Priority: Major
> Attachments: 29843180-8924-11ec-8ad5-5b5a8342f2d3.png,
> 2b4df500-8924-11ec-927a-de3d8adc6fe0.png, Screenshot_2022-02-15_17-35-18.png
>
>
> In every ACL check under native Ozone authorizer, it calls
> [keyManager.checkAccess|#L162]. KeyManagerImpl#checkAccess [calls
> getFileStatus() as
> well|https://github.com/apache/ozone/blob/76aa27e7c05196ae00cba540efce4bb7529e5d15/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java#L1804],
> which finally [calls pipeline refresh()|#L2011]. Pipeline refresh is not
> needed here because it just obtains key ACL and no need for blocks. This
> causes additional external RPC call to SCM, which is unnecessary overhead on
> each object-get.
> We observed this issue in our production cluster, as 50% increase of latency
> estimated from wall clock profile:
> !Screenshot_2022-02-15_17-35-18.png|width=739,height=452!
> Also, our monitoring shows 2x lookup key to OM, which increases SCM call
> count of GetContainerWithPipeline.
> !29843180-8924-11ec-8ad5-5b5a8342f2d3.png|width=797,height=245!
> !2b4df500-8924-11ec-927a-de3d8adc6fe0.png|width=798,height=239!
>
> I'm not sure how to fix this issue regarding {color:#6e7781}HDDS-3658{color}
> . Cleanest way would be re-utilizing again refreshPipeline flag, but it'd be
> a hustle to consider all cases using getFileStatus(). HDDS-5450 may be give
> us some hints.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]