[
https://issues.apache.org/jira/browse/RANGER-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17889303#comment-17889303
]
Madhan Neethiraj commented on RANGER-4959:
------------------------------------------
[~anandNadar] - proposed changes look good. Couple of further improvements to
consider:
# {{x_service_resource.tags_text}} already includes attributes of all tags
associated with the resource. Hence {{x_tag_resource_map.tagAttrs}} seems
unnecessary
# How do you plan to update existing foreign key
{{{}x_tag_resource_map.tag_id{}}}? Given {{x_tag}} table will effectively be
dropped, it makes sense to point this column to {{{}x_tag_def.id{}}}. If yes,
is the proposed new column, {{type}} necessary?
I don't think {{x_tag.owned_by}} field is currently used. So, it would be okay
to not carry this field in the proposed changes.
REST APIs in TagREST.java that support operations on x_tag entries should be
updated to throw an exception like UnsupportedOperationException:
* createTag(RangerTag tag)
* updateTag(RangerTag tag)
* deleteTag(Long id)
* getTag(Long id)
* getTagsByType(String type)
* getAllTags()
> [Ranger-Admin] Remove use of x_tag table
> ----------------------------------------
>
> Key: RANGER-4959
> URL: https://issues.apache.org/jira/browse/RANGER-4959
> Project: Ranger
> Issue Type: Improvement
> Components: admin
> Reporter: Anand Nadar
> Assignee: Anand Nadar
> Priority: Major
>
> Below is a metrics retrieved for importservicetags api to create tags and
> tag-resource association.
> |Duration|10 min|
> |Successful request |49|
> |Number of tags for each resource |6 |
> |Number of columns |100 |
> |Total resources tag mapping |6*100 = 600|
> |Total tag resource map in overall |49*600 = 29400 records in db |
> |Rate per min |2940|
> * Number of tables = 200k
> * Number of columns = 500
> * Avg tag on each column =6
> * Total resources tag mapping = 200000 * 500*6 = 600,000,000
> * Time as per rate with 10 threads = 600,000,000/2940 = 204081 min =
> 141.722917 days
> Above is the performance of the importservicetags api with 2GB heap memory.
>
> Therefore to improve this performance, we are trying to do the below changes.
> # Remove usage of x_tag table
> # In the x_tag_resource_map table, the association should be between the tag
> def id and the resource id.
> # The x_tag_resource_map will have 2 new columns "tagAttrs" to store the
> tag attributes and "type" which will be the name of tagDef.
> # tags_text which is stored in x_service_resource table can have the below
> data to reduce its size.
> {code:java}
> [{"id":1069576,"isEnabled":true,"name":"TAG1","attributes":{"restricted1":"true"}}]
> {code}
> id, isEnabled, name - These data will be from x_tag_def
> attributes - This will be retrieved from the x_tag_resource_map table for
> that particular resource.
> The tag owner case is not being handled here.
> ImportserviceTags flow
> - create tagDef if not exists
> - Create service resource if not exists
> - Create tag Def and resource association with tag attributes
> - Refresh the tags_text in x_service_resource (This can be handled in a
> separate thread)
> The download json structure will be maintained to minimise plugin side
> changes.
> The importservicetags json input will remain the same.
> tag delta and tag dedup will be affected, it needs to be handled accordingly
> cc: [~madhan] [~avadhavkar]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)