[
https://issues.apache.org/jira/browse/RANGER-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Barbara Eckman updated RANGER-4298:
-----------------------------------
Description:
We are in the process of migrating our entire large data lakes (S3, MinIO) from
resource-based to tag-based AC. This migration is phased and estimated to
continue through Q3. In this phase we need to import 250K tags into Ranger
from Apache Atlas. At the current, single-threaded rate of 500ms/tag, this
will take 6 to 7 days.
Beyond this migration, we expect to be incorporating multiple external datasets
into our datalakes at regular intervals, which will cause roughly the same
quantity of tags to be imported.
So this is not a one-time thing, it is expected to recur. It would be great if
tagsync could be made multi-threaded.
was:
We are in the process of migrating our entire large data lakes (S3, MinIO) from
resource-based to tag-based AC. This migration is phased and estimated to
continue through Q3. In this phase we need to import 250K tags into Ranger
from Apache Atlas. At the current, single-threaded rate of 500ms/tag, this
will take 6 to 7 days.
Beyond this migration, we expect to be incorporating multiple external datasets
into our datalakes, which will cause roughly the same quantity of tags to be
imported.
So this is not a one-time thing, it is expected to recur. It would be great if
tagsync could be made multi-threaded.
> Multi-threaded tagsync needed
> -----------------------------
>
> Key: RANGER-4298
> URL: https://issues.apache.org/jira/browse/RANGER-4298
> Project: Ranger
> Issue Type: Improvement
> Components: Ranger, tagsync
> Reporter: Barbara Eckman
> Priority: Major
>
> We are in the process of migrating our entire large data lakes (S3, MinIO)
> from resource-based to tag-based AC. This migration is phased and estimated
> to continue through Q3. In this phase we need to import 250K tags into
> Ranger from Apache Atlas. At the current, single-threaded rate of 500ms/tag,
> this will take 6 to 7 days.
> Beyond this migration, we expect to be incorporating multiple external
> datasets into our datalakes at regular intervals, which will cause roughly
> the same quantity of tags to be imported.
> So this is not a one-time thing, it is expected to recur. It would be great
> if tagsync could be made multi-threaded.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)