[
https://issues.apache.org/jira/browse/SENTRY-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16406587#comment-16406587
]
Na Li commented on SENTRY-2184:
-------------------------------
[~akolb] 1) how it works is
1.1) When an object is retrieved from the datastore by JDO typically not all
fields are retrieved immediately. This is because for efficiency purposes only
particular field types are retrieved in the initial access of the object, and
then any other objects are retrieved when accessed (lazy loading).
1.2) The group of fields that are loaded is called a *fetch group*.
1.3) The basic fetch group defines which fields are to be fetched. It doesn't
explicitly define how far down an object graph is to be fetched. JDO2 provides
two ways of controlling this.
The first is to set the *maxFetchDepth* for the _FetchPlan_. This value
specifies how far out from the root object the related objects, defined in
fetch group, will be fetched. A positive value means that this number of
relationships will be traversed from the root object. A value of -1 means that
no limit will be placed on the fetching traversal. The default is 1. Let's take
an example
2) I checked the log message, without the fix, separate query for MPath is
issued for each MAuthzPathsMapping instance. With the fix, only one query for
MPath is issued for all MAuthzPathsMapping instances. So the performance
improvement depends on how many MAuthzPathsMapping instances in the full
snapshot, and how long to get response for one query.
For example, for Oracle DB, the following shows the behavior without fix. In
average, it took 144 ms to get MPath for a single MAuthzPathsMapping instance.
For Oracle, there are two queries, one for count, and one for actual MPath
value.
The time spent on getting all MPath value for this environment is:
{color:#d04437}144 ms * Number_of_MAuthzPathsMapping _instance{color}.
With the fix, the time to get all MPath value is around {color:#205081}*144
ms*{color}. So the more MAuthzPathsMapping instances in full path snapshot, the
more significant the improvement will be.
{code:java}
Line 2629: 2018-03-12 18:29:44,096 DEBUG DataNucleus.Datastore.Native: SELECT
COUNT(*) FROM AUTHZ_PATH THIS WHERE THIS.AUTHZ_OBJ_ID=<278282>
Line 3690: 2018-03-12 18:29:44,178 DEBUG DataNucleus.Datastore.Native: SELECT
'org.apache.sentry.provider.db.service.model.MPath' AS
NUCLEUS_TYPE,A0.PATH_NAME,A0.PATH_ID FROM AUTHZ_PATH A0 WHERE A0.AUTHZ_OBJ_ID =
<278282>
Line 5581: 2018-03-12 18:29:44,261 DEBUG DataNucleus.Datastore.Native: SELECT
COUNT(*) FROM AUTHZ_PATH THIS WHERE THIS.AUTHZ_OBJ_ID=<278283>
Line 7757: 2018-03-12 18:29:44,344 DEBUG DataNucleus.Datastore.Native: SELECT
'org.apache.sentry.provider.db.service.model.MPath' AS
NUCLEUS_TYPE,A0.PATH_NAME,A0.PATH_ID FROM AUTHZ_PATH A0 WHERE A0.AUTHZ_OBJ_ID =
<278283>
Line 8807: 2018-03-12 18:29:44,425 DEBUG DataNucleus.Datastore.Native: SELECT
COUNT(*) FROM AUTHZ_PATH THIS WHERE THIS.AUTHZ_OBJ_ID=<278284>
Line 9892: 2018-03-12 18:29:44,506 DEBUG DataNucleus.Datastore.Native: SELECT
'org.apache.sentry.provider.db.service.model.MPath' AS
NUCLEUS_TYPE,A0.PATH_NAME,A0.PATH_ID FROM AUTHZ_PATH A0 WHERE A0.AUTHZ_OBJ_ID =
<278284>
Line 10915: 2018-03-12 18:29:44,589 DEBUG DataNucleus.Datastore.Native: SELECT
COUNT(*) FROM AUTHZ_PATH THIS WHERE THIS.AUTHZ_OBJ_ID=<278285>
Line 12103: 2018-03-12 18:29:44,671 DEBUG DataNucleus.Datastore.Native: SELECT
'org.apache.sentry.provider.db.service.model.MPath' AS
NUCLEUS_TYPE,A0.PATH_NAME,A0.PATH_ID FROM AUTHZ_PATH A0 WHERE A0.AUTHZ_OBJ_ID =
<278285>
{code}
> Performance Issue: MPath is queried for each MAuthzPathsMapping in full
> snapshot
> --------------------------------------------------------------------------------
>
> Key: SENTRY-2184
> URL: https://issues.apache.org/jira/browse/SENTRY-2184
> Project: Sentry
> Issue Type: Bug
> Components: Sentry
> Affects Versions: 2.1.0
> Reporter: Na Li
> Assignee: Na Li
> Priority: Critical
> Attachments: SENTRY-2184.001.patch
>
>
> MAuthzPathsMapping contains list of MPath instances. From log message, when
> getting path full snapshot at SentryStore.retrieveFullPathsImageCore(),
> DataNucleus issues a query for all MPath instances associated with each
> MAuthzPathsMapping. Therefore, getting full path image may take a very long
> time.
> The solution is to get MPath in a batch when getting full path image.
> Log Message when DataNucleus issues a query for all MPath instances
> associated with each MAuthzPathsMapping
> {code:java}
> 1) Initially, all MAuthzPathsMapping entries for current snapshot is queried.
> 2018-03-14 11:51:23,999 (main) [DEBUG -
> org.datanucleus.util.Log4JLogger.debug(Log4JLogger.java:58)] SELECT
> 'org.apache.sentry.provider.db.service.model.MAuthzPathsMapping' AS
> NUCLEUS_TYPE,A0.AUTHZ_OBJ_NAME,A0.AUTHZ_SNAPSHOT_ID,A0.CREATE_TIME_MS,A0.AUTHZ_OBJ_ID
> FROM AUTHZ_PATHS_MAPPING A0 WHERE A0.AUTHZ_SNAPSHOT_ID = <1>
> 2) call authzToPaths.getPathStrings() causes MPath to be queried for each
> AUTHZ_OBJ_ID
> 2018-03-14 11:52:27,700 (main) [DEBUG -
> org.datanucleus.util.Log4JLogger.debug(Log4JLogger.java:58)] SELECT
> 'org.apache.sentry.provider.db.service.model.MPath' AS
> NUCLEUS_TYPE,A0.PATH_NAME,A0.PATH_ID FROM AUTHZ_PATH A0 WHERE A0.AUTHZ_OBJ_ID
> = <1>{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)