> On Sept. 28, 2022, 6:12 p.m., Abhay Kulkarni wrote: > > agents-common/src/main/java/org/apache/ranger/plugin/util/RangerServiceTagsDeltaUtil.java > > Line 102 (original), 105 (patched) > > <https://reviews.apache.org/r/74144/diff/1/?file=2270242#file2270242line108> > > > > This needs a review to reduce the M x N complexity. If there is a map > > of <resource-id, service-resource> maintained (in memory) for existing > > service-tags, then the loop at line 115 may reduce to a map lookup by > > resource-id.
Yes. Updating this block to replace inner loop with a map look up will result in significant improvement. I was planning to address that in a subsequent patch. Without the current fix, delta processing didn't complete even after 1h35m; with the the delta processing gets to completion within few minutes. Also, I think we should review in-place updating of serviceTags instance passed as argument. This would be an issue if that instance is used in another thread while it is being updated. > On Sept. 28, 2022, 6:12 p.m., Abhay Kulkarni wrote: > > security-admin/src/main/java/org/apache/ranger/biz/TagDBStore.java > > Lines 1217 (patched) > > <https://reviews.apache.org/r/74144/diff/1/?file=2270243#file2270243line1219> > > > > This change may result in processing the same tag or resource id > > multiple times if the same object appears multiple times in the list of > > deltas. Does the change cause significant performance improvement? If not, > > please consider either reverting it or replacing the Map to a HashSet to > > ensure that each object appears at most once in the subsequent loop. This change replaces use of Map<Long, Long> with Set<Long> - since the map was populated with the same value for key and value. This change doesn't introduce any change in behavior. > On Sept. 28, 2022, 6:12 p.m., Abhay Kulkarni wrote: > > security-admin/src/main/java/org/apache/ranger/biz/TagDBStore.java > > Line 1251 (original), 1250 (patched) > > <https://reviews.apache.org/r/74144/diff/1/?file=2270243#file2270243line1255> > > > > Although this change reduces dependency on JPA cache behavior, it also > > removes one consistency check on the sanity of deltas and the state of the > > database. Perhaps, performing all of these computations/database > > interactions in a dedicated, read-only new transaction will obviate need > > for such peep-hole optimizations. This also applies to changes around line > > #1269. Yes. Use of read-only transactions can help avoid JPA overheads. However, wouldn't the optimizations introduced in this patch still be helpful to avoid overheads? > On Sept. 28, 2022, 6:12 p.m., Abhay Kulkarni wrote: > > security-admin/src/main/java/org/apache/ranger/biz/TagDBStore.java > > Lines 1282 (patched) > > <https://reviews.apache.org/r/74144/diff/1/?file=2270243#file2270243line1290> > > > > Please review if this code fragment is required after reviewing the > > following API and its callers. The changes to each of tags, resources and > > tag-resource-mappings may be completely kept track in the change-records > > created in the delta-tables. > > > > --- > > > > XXServiceVersionInfoDao.updateTagVersionAndTagUpdateTime(). Earlier version used rangerTagResourceMapService.getTagIdsForResourceId() to get IDs of tags associated with a given resources. This patch replaces this logic to obtain tag IDs from XXServiceResource.tags field. It is not clear hwo updateTagVersionAndTagUpdateTime() can help here. Can you please add details? Thanks. - Madhan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/74144/#review224714 ----------------------------------------------------------- On Sept. 28, 2022, 5:44 a.m., Madhan Neethiraj wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/74144/ > ----------------------------------------------------------- > > (Updated Sept. 28, 2022, 5:44 a.m.) > > > Review request for ranger, Ankita Sinha, Kishor Gollapalliwar, Abhay > Kulkarni, Mehul Parikh, Pradeep Agrawal, Ramesh Mani, Sailaja Polavarapu, > Subhrat Chaudhary, and Velmurugan Periasamy. > > > Bugs: RANGER-3934 > https://issues.apache.org/jira/browse/RANGER-3934 > > > Repository: ranger > > > Description > ------- > > - updated several retrieval of XXService JPA object by replacing with > retrival of serviceId > - replaced several instances of Map<Long, Long> with Set<Long>, as these > instances had same value for key and value > - avoided expensive calls to RangerTagResourceMapService.getByTagId(tagId) > and RangerTagResourceMapService.getTagIdsForResourceId(serviceResourceId) > - updated several methods in XXTagResourceMapDao to avoid loading of > XXTagResourceMap JPA object; instead created XXTagResourceMap object from > individial fields retrieved from query > > > Diffs > ----- > > > agents-common/src/main/java/org/apache/ranger/plugin/util/RangerServiceTagsDeltaUtil.java > 8d9241c1c > security-admin/src/main/java/org/apache/ranger/biz/TagDBStore.java > d8154b7de > > security-admin/src/main/java/org/apache/ranger/db/XXRMSServiceResourceDao.java > afa754ba2 > security-admin/src/main/java/org/apache/ranger/db/XXServiceDao.java > 3cc3d9cef > security-admin/src/main/java/org/apache/ranger/db/XXTagResourceMapDao.java > 3f8b5b718 > > security-admin/src/main/java/org/apache/ranger/patch/PatchForAtlasServiceDefUpdate_J10013.java > b0f71e138 > security-admin/src/main/java/org/apache/ranger/rest/TagREST.java c7cf3bfb8 > security-admin/src/main/resources/META-INF/jpa_named_queries.xml e4a2354b0 > > > Diff: https://reviews.apache.org/r/74144/diff/1/ > > > Testing > ------- > > - with Ranger database having ~1m service-resources, ~2m tags and delta of > 20k resource & 40k tags: > -- before these optmizations, tag-cache update from delta didn't complete > event after a long time (1h35m) > -- with these optmizations, tag-cache update from the same delta completed > within 10 minutes > - there is scope for further optimization > RangerServiceTagsDeltaUtil.applyDelta() - as this took almost 97% time in > updating the cache (588 seconds); compare this to > TagDBStore.getServiceTagsDelta() which completed in 13 seeconds > > > Thanks, > > Madhan Neethiraj > >
