dosubot[bot] commented on issue #8523: URL: https://github.com/apache/incubator-devlake/issues/8523#issuecomment-3162307562
<!-- Answer --> Your suspicion is correct: the plugin uses different logic for GitLab Server and Cloud. For self-hosted GitLab Server, it collects all users globally via the `/users` endpoint for every repository (data scope), which causes the same user data to be ingested multiple times and leads to excessive duplication in the `_raw_gitlab_api_users` table. For GitLab Cloud (gitlab.com, jihulab.com), it collects users per project, so duplication is less severe but still possible if users belong to multiple projects. This behavior is determined by a simple string match on the API endpoint in the code, which is brittle and can lead to inconsistent results if more SaaS domains are introduced or endpoint URLs change [source](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/plugins/gitlab/tasks/account_collector.go). For diagnostics, I suggest enabling debug logging during collection and capturing the actual API responses and parameters for both environments. Compare the number and content of user records collected from Server vs. Cloud to confirm the duplication pattern. Also, check if the same user IDs appear multiple times for different repositories in the raw table. As an interim mitigation, you can periodically deduplicate or clean up the `_raw_gitlab_api_users` table with a scheduled script, or limit the frequency of full user data collection. For large-scale integrations, consider adapting the workaround proposed in [issue #8283](https://github.com/apache/incubator-devlake/issues/8283) to collect users only once per pipeline for GitLab Server, which has been shown to significantly reduce table size and duplication. If you test with different GitLab Server versions or configurations, sharing your findings will help refine a long-term solution for the community. No merged PR addresses this yet, so feedback and testing on proposed workarounds are valuable. <!-- Dosu Comment Footer --> *To reply, just mention [@dosu](https://go.dosu.dev/dosubot).* --- <sup>How did I do? [Good](https://app.dosu.dev/response-feedback/a41da11d-20e9-4498-9765-5a48d84c27cc?feedback_type=great_response) | [Irrelevant](https://app.dosu.dev/response-feedback/a41da11d-20e9-4498-9765-5a48d84c27cc?feedback_type=irrelevant_answer) | [Incorrect](https://app.dosu.dev/response-feedback/a41da11d-20e9-4498-9765-5a48d84c27cc?feedback_type=incorrect_sources) | [Verbose](https://app.dosu.dev/response-feedback/a41da11d-20e9-4498-9765-5a48d84c27cc?feedback_type=too_verbose) | [Hallucination](https://app.dosu.dev/response-feedback/a41da11d-20e9-4498-9765-5a48d84c27cc?feedback_type=hallucination) | [Report 🐛](https://app.dosu.dev/response-feedback/a41da11d-20e9-4498-9765-5a48d84c27cc?feedback_type=bug_report) | [Other](https://app.dosu.dev/response-feedback/a41da11d-20e9-4498-9765-5a48d84c27cc?feedback_type=other)</sup> [](https://app.dosu.dev/b4e8e847-d479-4541-83a8-d88d83fea5c9/ask?utm_source=githu b) [](https://go.dosu.dev/discord-bot) [](https://twitter.com/intent/tweet?text=%40dosu_ai%20helped%20me%20solve%20this%20issue!&url=https%3A//github.com/apache/incubator-devlake/issues/8523) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@devlake.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org