[ 
https://issues.apache.org/jira/browse/RANGER-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barbara Eckman updated RANGER-4298:
-----------------------------------
    Description: 
We are in the process of migrating our entire large data lakes (S3, MinIO) from 
resource-based to tag-based AC.  This migration is phased and estimated to 
continue through Q3.  In this phase we need to import 250K tags into Ranger 
from Apache Atlas.  At the current, single-threaded rate of 500ms/tag, this 
will take 6 to 7 days.  

Beyond this migration, we expect to be incorporating multiple external datasets 
into our datalakes at regular intervals, which will cause roughly the same 
quantity of tags to be imported.  

So this is not a one-time thing, it is expected to recur.  It would be great if 
tagsync could be made multi-threaded.

  was:
We are in the process of migrating our entire large data lakes (S3, MinIO) from 
resource-based to tag-based AC.  This migration is phased and estimated to 
continue through Q3.  In this phase we need to import 250K tags into Ranger 
from Apache Atlas.  At the current, single-threaded rate of 500ms/tag, this 
will take 6 to 7 days.  

Beyond this migration, we expect to be incorporating multiple external datasets 
into our datalakes, which will cause roughly the same quantity of tags to be 
imported.  

So this is not a one-time thing, it is expected to recur.  It would be great if 
tagsync could be made multi-threaded.


> Multi-threaded tagsync needed
> -----------------------------
>
>                 Key: RANGER-4298
>                 URL: https://issues.apache.org/jira/browse/RANGER-4298
>             Project: Ranger
>          Issue Type: Improvement
>          Components: Ranger, tagsync
>            Reporter: Barbara Eckman
>            Priority: Major
>
> We are in the process of migrating our entire large data lakes (S3, MinIO) 
> from resource-based to tag-based AC.  This migration is phased and estimated 
> to continue through Q3.  In this phase we need to import 250K tags into 
> Ranger from Apache Atlas.  At the current, single-threaded rate of 500ms/tag, 
> this will take 6 to 7 days.  
> Beyond this migration, we expect to be incorporating multiple external 
> datasets into our datalakes at regular intervals, which will cause roughly 
> the same quantity of tags to be imported.  
> So this is not a one-time thing, it is expected to recur.  It would be great 
> if tagsync could be made multi-threaded.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to