[jira] [Updated] (RANGER-4959) [Ranger-Admin] Remove use of x_tag table and tags_text from x_service_resource table which has duplicate data

Anand Nadar (Jira) Mon, 14 Oct 2024 06:57:07 -0700


     [ 
https://issues.apache.org/jira/browse/RANGER-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Anand Nadar updated RANGER-4959:
--------------------------------
    Description: 
Below is a metrics retrieved for importservicetags api to create tags and 
tag-resource association.
|Duration|10 min|
|Successful request |49|
|Number of tags for each resource |6 |
|Number of columns |100 |
|Total resources tag mapping |6*100 = 600|
|Total tag resource map in overall |49*600 = 29400 records in db |
|Rate per min |2940|
 * Number of tables = 200k
 * Number of columns = 500 
 * Avg tag on each column =6 
 * Total resources tag mapping =  200000 * 500*6 = 600,000,000 
 * Time as per rate with 10 threads  = 600,000,000/2940 = 204081 min  = 
141.722917 days

Above is the performance of the importservicetags api with 2GB heap memory.

 

Therefore to improve this performance, we are trying to do the below changes.
 # Remove usage of x_tag table
 # In the x_tag_resource_map table, the association should be between the tag 
def id and the resource id.
 # The x_tag_resource_map will have a new column "tagAttrs" to store the tag 
attributes.
 # tags_text which is stored in x_service_resource table can have the below 
data to reduce its size.

{code:java}
[{"id":1069576,"isEnabled":true,"name":"TAG1","attributes":{"restricted1":"true"}}]
 {code}
id, isEnabled, name - These data will be from x_tag_def
attributes - This will be retrieved from the x_tag_resource_map table for that 
particular resource.

 

The tag owner case is not being handled here.

The download json structure will be maintained to minimise plugin side changes.
The importservicetags json input will remain the same.
tag delta and tag dedup will be affected, it needs to be handled accordingly

cc: [~madhan] [~avadhavkar] 

  was:
Below is a metrics retrieved for importservicetags api to create tags and 
tag-resource association.
|Duration|10 min|
|Successful request |49|
|Number of tags for each resource |6 |
|Number of columns |100 |
|Total resources tag mapping |6*100 = 600|
|Total tag resource map in overall |49*600 = 29400 records in db |
|Rate per min |2940|
 * Number of tables = 200k
 * Number of columns = 500 
 * Avg tag on each column =6 
 * Total resources tag mapping =  200000 * 500*6 = 600,000,000 
 * Time as per rate with 10 threads  = 600,000,000/2940 = 204081 min  = 
141.722917 days

Above is the performance of the importservicetags api with 2GB heap memory.

 

Therefore to improve this performance, we are trying to do the below changes.
 # Remove usage of x_tag table
 # In the x_tag_resource_map table, the association should be between the tag 
def id and the resource id.
 # The x_tag_resource_map will have a new column "tagAttrs" to store the tag 
attributes.
 # tags_text which is stored in x_service_resource table can have the below 
data to reduce its size.

{code:java}
[{"id":1069576,"isEnabled":true,"name":"TAG1","attributes":{"restricted1":"true"}}]
 {code}
id, isEnabled, name - These data will be from x_tag_def
attributes - This will be retrieved from the x_tag_resource_map table for that 
particular resource.

 

The tag owner case is not being handled here.
cc: [~madhan] [~avadhavkar] 


> [Ranger-Admin] Remove use of x_tag table and tags_text from 
> x_service_resource table which has duplicate data
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: RANGER-4959
>                 URL: https://issues.apache.org/jira/browse/RANGER-4959
>             Project: Ranger
>          Issue Type: Improvement
>          Components: admin
>            Reporter: Anand Nadar
>            Assignee: Anand Nadar
>            Priority: Major
>
> Below is a metrics retrieved for importservicetags api to create tags and 
> tag-resource association.
> |Duration|10 min|
> |Successful request |49|
> |Number of tags for each resource |6 |
> |Number of columns |100 |
> |Total resources tag mapping |6*100 = 600|
> |Total tag resource map in overall |49*600 = 29400 records in db |
> |Rate per min |2940|
>  * Number of tables = 200k
>  * Number of columns = 500 
>  * Avg tag on each column =6 
>  * Total resources tag mapping =  200000 * 500*6 = 600,000,000 
>  * Time as per rate with 10 threads  = 600,000,000/2940 = 204081 min  = 
> 141.722917 days
> Above is the performance of the importservicetags api with 2GB heap memory.
>  
> Therefore to improve this performance, we are trying to do the below changes.
>  # Remove usage of x_tag table
>  # In the x_tag_resource_map table, the association should be between the tag 
> def id and the resource id.
>  # The x_tag_resource_map will have a new column "tagAttrs" to store the tag 
> attributes.
>  # tags_text which is stored in x_service_resource table can have the below 
> data to reduce its size.
> {code:java}
> [{"id":1069576,"isEnabled":true,"name":"TAG1","attributes":{"restricted1":"true"}}]
>  {code}
> id, isEnabled, name - These data will be from x_tag_def
> attributes - This will be retrieved from the x_tag_resource_map table for 
> that particular resource.
>  
> The tag owner case is not being handled here.
> The download json structure will be maintained to minimise plugin side 
> changes.
> The importservicetags json input will remain the same.
> tag delta and tag dedup will be affected, it needs to be handled accordingly
> cc: [~madhan] [~avadhavkar] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (RANGER-4959) [Ranger-Admin] Remove use of x_tag table and tags_text from x_service_resource table which has duplicate data

Reply via email to