[ https://issues.apache.org/jira/browse/RANGER-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anand Nadar updated RANGER-4959: -------------------------------- Description: Below is a metrics retrieved for importservicetags api to create tags and tag-resource association. |Duration|10 min| |Successful request |49| |Number of tags for each resource |6 | |Number of columns |100 | |Total resources tag mapping |6*100 = 600| |Total tag resource map in overall |49*600 = 29400 records in db | |Rate per min |2940| * Number of tables = 200k * Number of columns = 500 * Avg tag on each column =6 * Total resources tag mapping = 200000 * 500*6 = 600,000,000 * Time as per rate with 10 threads = 600,000,000/2940 = 204081 min = 141.722917 days Above is the performance of the importservicetags api with 2GB heap memory. Therefore to improve this performance, we are trying to do the below changes. # Remove usage of x_tag table # In the x_tag_resource_map table, the association should be between the tag def id and the resource id. # The x_tag_resource_map will have a new column "tagAttrs" to store the tag attributes. # tags_text which is stored in x_service_resource table can have the below data to reduce its size. {code:java} [{"id":1069576,"isEnabled":true,"type":"TAG1","attributes":{"restricted1":"true"}}] {code} id, isEnabled, type(name) - These data will be from x_tag_def attributes - This will be retrieved from the x_tag_resource_map table for that particular resource. The tag owner case is not being handled here. ImportserviceTags flow - create tagDef if not exists - Create service resource if not exists - Create tag Def and resource association with tag attributes - Refresh the tags_text in x_service_resource (This can be handled in a separate thread) The download json structure will be maintained to minimise plugin side changes. The importservicetags json input will remain the same. tag delta and tag dedup will be affected, it needs to be handled accordingly cc: [~madhan] [~avadhavkar] *According to current implementation :* # *x_service_resource* – 500 entries * ** Example: {{ID: 1, Resource: table1, Attribute: tag_text}} # *x_tag_def* – 6 entries * ** Example: {{ID: 1, Tag Definition: tag1}} # *x_tag* – 3000 entries * ** Example: {{ID: 1, Tag Definition ID: 1, Tag: name=value}} # *x_tag_resource_map* – 3000 entries * ** Example: {{Resource ID: 1, Tag ID: 1}} * *update tags_text of x_service_resource* with all the associated tags for each resource - 3000 updates *Total Operations: 9506 rows* * 500 (x_service_resource) + 6 (x_tag_def) + 3000 (x_tag) + 3000 (x_tag_resource_map) + 3000 (update x_service_resource) ---- *After new implementation:* # *x_tag_def* – 6 entries * ** Example: {{ID: 1, Tag Definition: tag1}} # *x_service_resource* – 500 entries * ** Example: {{ID: 1, Resource: table1, Attribute: tag_text}} ** calculate the tags associated with the resource and update the tags_text as well. # *x_service_resource_ref_tag* – 3000 entries * ** Example: {{Tag Definition ID: 1, Resource ID: 1}} *Total Insert Operations: 3506 rows* * 500 (x_service_resource) + 6 (x_tag_def) + 3000 (x_service_resource_ref_tag) was: Below is a metrics retrieved for importservicetags api to create tags and tag-resource association. |Duration|10 min| |Successful request |49| |Number of tags for each resource |6 | |Number of columns |100 | |Total resources tag mapping |6*100 = 600| |Total tag resource map in overall |49*600 = 29400 records in db | |Rate per min |2940| * Number of tables = 200k * Number of columns = 500 * Avg tag on each column =6 * Total resources tag mapping = 200000 * 500*6 = 600,000,000 * Time as per rate with 10 threads = 600,000,000/2940 = 204081 min = 141.722917 days Above is the performance of the importservicetags api with 2GB heap memory. Therefore to improve this performance, we are trying to do the below changes. # Remove usage of x_tag table # In the x_tag_resource_map table, the association should be between the tag def id and the resource id. # The x_tag_resource_map will have a new column "tagAttrs" to store the tag attributes. # tags_text which is stored in x_service_resource table can have the below data to reduce its size. {code:java} [{"id":1069576,"isEnabled":true,"name":"TAG1","attributes":{"restricted1":"true"}}] {code} id, isEnabled, name - These data will be from x_tag_def attributes - This will be retrieved from the x_tag_resource_map table for that particular resource. The tag owner case is not being handled here. ImportserviceTags flow - create tagDef if not exists - Create service resource if not exists - Create tag Def and resource association with tag attributes - Refresh the tags_text in x_service_resource (This can be handled in a separate thread) The download json structure will be maintained to minimise plugin side changes. The importservicetags json input will remain the same. tag delta and tag dedup will be affected, it needs to be handled accordingly cc: [~madhan] [~avadhavkar] *According to current implementation :* # *x_service_resource* – 500 entries * ** Example: {{ID: 1, Resource: table1, Attribute: tag_text}} # *x_tag_def* – 6 entries * ** Example: {{ID: 1, Tag Definition: tag1}} # *x_tag* – 3000 entries * ** Example: {{ID: 1, Tag Definition ID: 1, Tag: name=value}} # *x_tag_resource_map* – 3000 entries * ** Example: {{Resource ID: 1, Tag ID: 1}} * *update tags_text of x_service_resource* with all the associated tags for each resource - 3000 updates *Total Operations: 9506 rows* * 500 (x_service_resource) + 6 (x_tag_def) + 3000 (x_tag) + 3000 (x_tag_resource_map) + 3000 (update x_service_resource) ---- *After new implementation:* # *x_tag_def* – 6 entries * ** Example: {{ID: 1, Tag Definition: tag1}} # *x_service_resource* – 500 entries * ** Example: {{ID: 1, Resource: table1, Attribute: tag_text}} ** calculate the tags associated with the resource and update the tags_text as well. # *x_service_resource_ref_tag* – 3000 entries * ** Example: {{Tag Definition ID: 1, Resource ID: 1}} *Total Insert Operations: 3506 rows* * 500 (x_service_resource) + 6 (x_tag_def) + 3000 (x_service_resource_ref_tag) > [Ranger-Admin] Remove use of x_tag table > ---------------------------------------- > > Key: RANGER-4959 > URL: https://issues.apache.org/jira/browse/RANGER-4959 > Project: Ranger > Issue Type: Improvement > Components: admin > Reporter: Anand Nadar > Assignee: Anand Nadar > Priority: Major > > Below is a metrics retrieved for importservicetags api to create tags and > tag-resource association. > |Duration|10 min| > |Successful request |49| > |Number of tags for each resource |6 | > |Number of columns |100 | > |Total resources tag mapping |6*100 = 600| > |Total tag resource map in overall |49*600 = 29400 records in db | > |Rate per min |2940| > * Number of tables = 200k > * Number of columns = 500 > * Avg tag on each column =6 > * Total resources tag mapping = 200000 * 500*6 = 600,000,000 > * Time as per rate with 10 threads = 600,000,000/2940 = 204081 min = > 141.722917 days > Above is the performance of the importservicetags api with 2GB heap memory. > > Therefore to improve this performance, we are trying to do the below changes. > # Remove usage of x_tag table > # In the x_tag_resource_map table, the association should be between the tag > def id and the resource id. > # The x_tag_resource_map will have a new column "tagAttrs" to store the tag > attributes. > # tags_text which is stored in x_service_resource table can have the below > data to reduce its size. > {code:java} > [{"id":1069576,"isEnabled":true,"type":"TAG1","attributes":{"restricted1":"true"}}] > {code} > id, isEnabled, type(name) - These data will be from x_tag_def > attributes - This will be retrieved from the x_tag_resource_map table for > that particular resource. > The tag owner case is not being handled here. > ImportserviceTags flow > - create tagDef if not exists > - Create service resource if not exists > - Create tag Def and resource association with tag attributes > - Refresh the tags_text in x_service_resource (This can be handled in a > separate thread) > The download json structure will be maintained to minimise plugin side > changes. > The importservicetags json input will remain the same. > tag delta and tag dedup will be affected, it needs to be handled accordingly > cc: [~madhan] [~avadhavkar] > *According to current implementation :* > # *x_service_resource* – 500 entries > * > ** Example: {{ID: 1, Resource: table1, Attribute: tag_text}} > # *x_tag_def* – 6 entries > * > ** Example: {{ID: 1, Tag Definition: tag1}} > # *x_tag* – 3000 entries > * > ** Example: {{ID: 1, Tag Definition ID: 1, Tag: name=value}} > # *x_tag_resource_map* – 3000 entries > * > ** Example: {{Resource ID: 1, Tag ID: 1}} > * *update tags_text of x_service_resource* with all the associated tags for > each resource - 3000 updates > *Total Operations: 9506 rows* > * 500 (x_service_resource) + 6 (x_tag_def) + 3000 (x_tag) + 3000 > (x_tag_resource_map) + 3000 (update x_service_resource) > ---- > *After new implementation:* > # *x_tag_def* – 6 entries > * > ** Example: {{ID: 1, Tag Definition: tag1}} > # *x_service_resource* – 500 entries > * > ** Example: {{ID: 1, Resource: table1, Attribute: tag_text}} > ** calculate the tags associated with the resource and update the tags_text > as well. > # *x_service_resource_ref_tag* – 3000 entries > * > ** Example: {{Tag Definition ID: 1, Resource ID: 1}} > *Total Insert Operations: 3506 rows* > * 500 (x_service_resource) + 6 (x_tag_def) + 3000 > (x_service_resource_ref_tag) -- This message was sent by Atlassian Jira (v8.20.10#820010)