[ 
https://issues.apache.org/jira/browse/RANGER-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17889318#comment-17889318
 ] 

Anand Nadar commented on RANGER-4959:
-------------------------------------

okay, got it. So there will be only one entry for the tagDefId and the 
resourceId association.
For existing tag migration if there are multiple tags of the same type 
associated with a single resource, then we would need to combine all their 
attributes and store it in tags_text in the x_service_resource.
The tags_text of all the resource would need to be updated with the new 
structure.

> [Ranger-Admin] Remove use of x_tag table
> ----------------------------------------
>
>                 Key: RANGER-4959
>                 URL: https://issues.apache.org/jira/browse/RANGER-4959
>             Project: Ranger
>          Issue Type: Improvement
>          Components: admin
>            Reporter: Anand Nadar
>            Assignee: Anand Nadar
>            Priority: Major
>
> Below is a metrics retrieved for importservicetags api to create tags and 
> tag-resource association.
> |Duration|10 min|
> |Successful request |49|
> |Number of tags for each resource |6 |
> |Number of columns |100 |
> |Total resources tag mapping |6*100 = 600|
> |Total tag resource map in overall |49*600 = 29400 records in db |
> |Rate per min |2940|
>  * Number of tables = 200k
>  * Number of columns = 500 
>  * Avg tag on each column =6 
>  * Total resources tag mapping =  200000 * 500*6 = 600,000,000 
>  * Time as per rate with 10 threads  = 600,000,000/2940 = 204081 min  = 
> 141.722917 days
> Above is the performance of the importservicetags api with 2GB heap memory.
>  
> Therefore to improve this performance, we are trying to do the below changes.
>  # Remove usage of x_tag table
>  # In the x_tag_resource_map table, the association should be between the tag 
> def id and the resource id.
>  # The x_tag_resource_map will have a new column  "tagAttrs" to store the tag 
> attributes.
>  # tags_text which is stored in x_service_resource table can have the below 
> data to reduce its size.
> {code:java}
> [{"id":1069576,"isEnabled":true,"name":"TAG1","attributes":{"restricted1":"true"}}]
>  {code}
> id, isEnabled, name - These data will be from x_tag_def
> attributes - This will be retrieved from the x_tag_resource_map table for 
> that particular resource.
> The tag owner case is not being handled here.
> ImportserviceTags flow
>  - create tagDef if not exists
>  - Create service resource if not exists
>  - Create tag Def and resource association with tag attributes
>  - Refresh the tags_text in x_service_resource (This can be handled in a 
> separate thread)
> The download json structure will be maintained to minimise plugin side 
> changes.
> The importservicetags json input will remain the same.
> tag delta and tag dedup will be affected, it needs to be handled accordingly
> cc: [~madhan] [~avadhavkar] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to