[jira] [Updated] (RANGER-4959) [Ranger-Admin] Remove use of x_tag table and tags_text from x_service_resource table which has duplicate data

Anand Nadar (Jira) Mon, 14 Oct 2024 06:53:04 -0700


     [ 
https://issues.apache.org/jira/browse/RANGER-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Anand Nadar updated RANGER-4959:
--------------------------------
    Description: 
Below is a metrics retrieved for importservicetags api to create tags and 
tag-resource association.
|Duration|10 min|
|Successful request |49|
|Number of tags for each resource |6 |
|Number of columns |100 |
|Total resources tag mapping |6*100 = 600|
|Total tag resource map in overall |49*600 = 29400 records in db |
|Rate per min |2940|
 * Number of tables = 200k
 * Number of columns = 500 
 * Avg tag on each column =6 
 * Total resources tag mapping =  200000 * 500*6 = 600,000,000 
 * Time as per rate with 10 threads  = 600,000,000/2940 = 204081 min  = 
141.722917 days

Above is the performance of the importservicetags api with 2GB heap memory.

 

Therefore to improve this performance, we are trying to do the below changes.
 # Remove usage of x_tag table
 # In the x_tag_resource_map table, the association should be between the tag 
def id and the resource id.
 # The x_tag_resource_map will have a new column "tagAttrs" to store the tag 
attributes.
 # tags_text which is stored in x_service_resource table can have the below 
data to reduce its size.

{code:java}
[{"id":1069576,"isEnabled":true,"name":"TAG1","attributes":{"restricted1":"true"}}]
 {code}
id, isEnabled, name - These data will be from x_tag_def
attributes - This will be retrieved from the x_tag_resource_map table for that 
particular resource.

 

The tag owner case is not being handled here.
cc: [~madhan] [~avadhavkar] 

  was:
Below is a metrics retrieved for importservicetags api to create tags and 
tag-resource association.
|Duration| 10 min|
|Successful request |49|
|Number of tags for each resource |6 |
|Number of columns |100 |
|Total resources tag mapping |6*100 = 600|
|Total tag resource map in overall |49*600 = 29400 records in db |
|Rate per min |2940|
 * Number of tables = 200k
 * Number of columns = 500 
 * Avg tag on each column =6 
 * Total resources tag mapping =  200000 * 500*6 = 600,000,000 
 * Time as per rate with 10 threads  = 600,000,000/2940 = 204081 min  = 
141.722917 days

Above is the performance of the importservicetags api with 2GB heap memory.

 

Therefore to improve this performance, we are trying to do the below changes.
 # Remove usage of x_tag table
 # In the x_tag_resource_map table, the association should be between the tag 
def id and the resource id.
 # The x_tag_resource_map will have a new column "tagAttrs" to store the tag 
attributes.
 # tags_text which is stored in x_service_resource table can have the below 
data to reduce its size.

{code:java}
[{"id":1069576,"isEnabled":true,"name":"TAG1","attributes":{"restricted1":"true"}}]
 {code}
id, isEnabled, name - These data will be from x_tag_def
attributes - This will be retrieved from the x_tag_resource_map table for that 
particular resource.

 

The tag owner case is not being handled here.


> [Ranger-Admin] Remove use of x_tag table and tags_text from 
> x_service_resource table which has duplicate data
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: RANGER-4959
>                 URL: https://issues.apache.org/jira/browse/RANGER-4959
>             Project: Ranger
>          Issue Type: Improvement
>          Components: admin
>            Reporter: Anand Nadar
>            Assignee: Anand Nadar
>            Priority: Major
>
> Below is a metrics retrieved for importservicetags api to create tags and 
> tag-resource association.
> |Duration|10 min|
> |Successful request |49|
> |Number of tags for each resource |6 |
> |Number of columns |100 |
> |Total resources tag mapping |6*100 = 600|
> |Total tag resource map in overall |49*600 = 29400 records in db |
> |Rate per min |2940|
>  * Number of tables = 200k
>  * Number of columns = 500 
>  * Avg tag on each column =6 
>  * Total resources tag mapping =  200000 * 500*6 = 600,000,000 
>  * Time as per rate with 10 threads  = 600,000,000/2940 = 204081 min  = 
> 141.722917 days
> Above is the performance of the importservicetags api with 2GB heap memory.
>  
> Therefore to improve this performance, we are trying to do the below changes.
>  # Remove usage of x_tag table
>  # In the x_tag_resource_map table, the association should be between the tag 
> def id and the resource id.
>  # The x_tag_resource_map will have a new column "tagAttrs" to store the tag 
> attributes.
>  # tags_text which is stored in x_service_resource table can have the below 
> data to reduce its size.
> {code:java}
> [{"id":1069576,"isEnabled":true,"name":"TAG1","attributes":{"restricted1":"true"}}]
>  {code}
> id, isEnabled, name - These data will be from x_tag_def
> attributes - This will be retrieved from the x_tag_resource_map table for 
> that particular resource.
>  
> The tag owner case is not being handled here.
> cc: [~madhan] [~avadhavkar] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (RANGER-4959) [Ranger-Admin] Remove use of x_tag table and tags_text from x_service_resource table which has duplicate data

Reply via email to