dosubot[bot] commented on issue #8818: URL: https://github.com/apache/incubator-devlake/issues/8818#issuecomment-4153152526
<!-- Answer --> This is a great question — here's what's happening and what you can do about it. **Why normal collection misses commits:** In incremental mode, the `gitextractor` plugin performs a **shallow git clone** (`git fetch --shallow-since=<timeAfter>`) instead of a full clone [[1]](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/plugins/gitextractor/parser/clone_gitcli.go). Commits whose **parent commits fall outside the `timeAfter` window** are intentionally skipped to avoid writing incorrect diff statistics [[2]](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/plugins/gitextractor/parser/repo_gogit.go). This means boundary commits get dropped from the `repo_commits` and `commit_parents` tables entirely. This behavior was reported in [Issue #8434](https://github.com/apache/incubator-devlake/issues/8434) [[3]](https://github.com/apache/incubator-devlake/issues/8434#issuecomment-2863461586). **Why Full Refresh Mode works:** When `FullSync = true`, DevLake performs a complete `git clone --bare` with no shallow clone or time boundary, so all commits and their parents are available [[4]](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/helpers/pluginhelper/api/subtask_state_manager.go). **Configuring scheduled tasks for Full Refresh:** Unfortunately, **there is currently no built-in way to make a cron-scheduled blueprint always run in Full Refresh Mode**. The `FullSync` flag is a trigger-time override only, not a persistent blueprint setting [[5]](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/core/models/blueprint.go). Your options are: 1. **Trigger via API each time** with Full Refresh: ``` POST /api/blueprints/{blueprintId}/trigger {"skipCollectors": false, "fullSync": true} ``` You could automate this with an external cron job calling this API endpoint [[6]](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/server/services/blueprint.go). 2. **Extend `timeAfter` generously** — set it far enough in the past that parent commits of all relevant commits fall within the window. The maintainer suggested *"increasing the time range"* as a workaround [[3]](https://github.com/apache/incubator-devlake/issues/8434#issuecomment-2863461586). 3. **Periodically update `timeAfter`** via the UI to keep the collection window appropriate for your needs [[7]](https://github.com/apache/incubator-devlake/issues/8467#issuecomment-2976273817). Note: Be aware that `timeAfter` is stored as a **fixed timestamp** from when the blueprint was configured — it does not automatically slide forward on cron runs [[7]](https://github.com/apache/incubator-devlake/issues/8467#issuecomment-2976273817). Also, avoid setting `timeAfter` to `null` via the PATCH API, as there's an open bug ([#8516](https://github.com/apache/incubator-devlake/issues/8516)) where it gets stored as `0001-01-01` instead [[8]](https://github.com/apache/incubator-devlake/issues/8516). <!-- Dosu Comment Footer --> *To reply, just mention [@dosu](https://go.dosu.dev/dosubot?utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=brand-link).* --- <sup>How did I do? [Good](https://app.dosu.dev/response-feedback/b1bc5491-942b-449c-9a5a-5f65c461ad46?feedback_type=great_response&utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=feedback-great_response) | [Irrelevant](https://app.dosu.dev/response-feedback/b1bc5491-942b-449c-9a5a-5f65c461ad46?feedback_type=irrelevant_answer&utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=feedback-irrelevant_answer) | [Incorrect](https://app.dosu.dev/response-feedback/b1bc5491-942b-449c-9a5a-5f65c461ad46?feedback_type=incorrect_sources&utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=feedback-incorrect_sources) | [Verbose](https://app.dosu.dev/response-feedback/b1bc5491-942b-449c-9a5a-5f65c461ad46?feedback_type=too_verbose&utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=feedback-too_verbose) | [Hallucination](https://app.dosu.dev/response-feedback/b1bc5491-942b -449c-9a5a-5f65c461ad46?feedback_type=hallucination&utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=feedback-hallucination) | [Report 🐛](https://app.dosu.dev/response-feedback/b1bc5491-942b-449c-9a5a-5f65c461ad46?feedback_type=bug_report&utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=feedback-bug_report) | [Other](https://app.dosu.dev/response-feedback/b1bc5491-942b-449c-9a5a-5f65c461ad46?feedback_type=other&utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=feedback-other)</sup> [](https://app.dosu.dev/b4e8e847-d479-4541-83a8-d88d83fea5c9/ask?utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=chat-badge) [](https://cursor.com/link/prompt?text=This%20is%20a%20great%20question%20%E2%80%94%20here%27s%20what%27s%20happening%20and%20what%20you%20can%20do%20about%20it.%0A%0A%2A%2AWhy%20normal%20collection%20misses%20commits%3A%2A%2A%20In%20incremental%20mode%2C%20the%20%60gitextractor%60%20plugin%20performs%20a%20%2A%2Ashallow%20git%20clone%2A%2A%20%28%60git%20fetch%20--shallow-since%3D%26lt%3BtimeAfter%26gt%3B%60%29%20instead%20of%20a%20full%20clone%20%5B%5B1%5D%5D%28https%3A//github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/plugins/gitextractor/parser/clone_gitcli.go%29.%20Commits%20whose%20%2A%2Aparent%20commits%20fall%20outside%20the%20%60timeAfter%60%20window%2A%2A%20are%2 0intentionally%20skipped%20to%20avoid%20writing%20incorrect%20diff%20statistics%20%5B%5B2%5D%5D%28https%3A//github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/plugins/gitextractor/parser/repo_gogit.go%29.%20This%20means%20boundary%20commits%20get%20dropped%20from%20the%20%60repo_commits%60%20and%20%60commit_parents%60%20tables%20entirely.%20This%20behavior%20was%20reported%20in%20%5BIssue%20%238434%5D%28https%3A//github.com/apache/incubator-devlake/issues/8434%29%20%5B%5B3%5D%5D%28https%3A//github.com/apache/incubator-devlake/issues/8434%23issuecomment-2863461586%29.%0A%0A%2A%2AWhy%20Full%20Refresh%20Mode%20works%3A%2A%2A%20When%20%60FullSync%20%3D%20true%60%2C%20DevLake%20performs%20a%20complete%20%60git%20clone%20--bare%60%20with%20no%20shallow%20clone%20or%20time%20boundary%2C%20so%20all%20commits%20and%20their%20parents%20are%20available%20%5B%5B4%5D%5D%28https%3A//github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf47 45b8b/backend/helpers/pluginhelper/api/subtask_state_manager.go%29.%0A%0A%2A%2AConfiguring%20scheduled%20tasks%20for%20Full%20Refresh%3A%2A%2A%20Unfortunately%2C%20%2A%2Athere%20is%20currently%20no%20built-in%20way%20to%20make%20a%20cron-scheduled%20blueprint%20always%20run%20in%20Full%20Refresh%20Mode%2A%2A.%20The%20%60FullSync%60%20flag%20is%20a%20trigger-time%20override%20only%2C%20not%20a%20persistent%20blueprint%20setting%20%5B%5B5%5D%5D%28https%3A//github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/core/models/blueprint.go%29.%20Your%20options%20are%3A%0A%0A1.%20%2A%2ATrigger%20via%20API%20each%20time%2A%2A%20with%20Full%20Refresh%3A%0A%0A%20%20%20%60%60%60%0A%20%20%20POST%20/api/blueprints/%7BblueprintId%7D/trigger%0A%20%20%20%7B%22skipCollectors%22%3A%20false%2C%20%22fullSync%22%3A%20true%7D%0A%20%20%20%60%60%60%0A%0A%20%20%20You%20could%20automate%20this%20with%20an%20external%20cron%20job%20calling%20this%20API%20endpoint%20%5B%5B6%5D %5D%28https%3A//github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/server/services/blueprint.go%29.%0A%0A2.%20%2A%2AExtend%20%60timeAfter%60%20generously%2A%2A%20%E2%80%94%20set%20it%20far%20enough%20in%20the%20past%20that%20parent%20commits%20of%20all%20relevant%20commits%20fall%20within%20the%20window.%20The%20maintainer%20suggested%20%2A%22increasing%20the%20time%20range%22%2A%20as%20a%20workaround%20%5B%5B3%5D%5D%28https%3A//github.com/apache/incubator-devlake/issues/8434%23issuecomment-2863461586%29.%0A%0A3.%20%2A%2APeriodically%20update%20%60timeAfter%60%2A%2A%20via%20the%20UI%20to%20keep%20the%20collection%20window%20appropriate%20for%20your%20needs%20%5B%5B7%5D%5D%28https%3A//github.com/apache/incubator-devlake/issues/8467%23issuecomment-2976273817%29.%0A%0ANote%3A%20Be%20aware%20that%20%60timeAfter%60%20is%20stored%20as%20a%20%2A%2Afixed%20timestamp%2A%2A%20from%20when%20the%20blueprint%20was%20configured%20%E2%80%94%20it%20does%20not% 20automatically%20slide%20forward%20on%20cron%20runs%20%5B%5B7%5D%5D%28https%3A//github.com/apache/incubator-devlake/issues/8467%23issuecomment-2976273817%29.%20Also%2C%20avoid%20setting%20%60timeAfter%60%20to%20%60null%60%20via%20the%20PATCH%20API%2C%20as%20there%27s%20an%20open%20bug%20%28%5B%238516%5D%28https%3A//github.com/apache/incubator-devlake/issues/8516%29%29%20where%20it%20gets%20stored%20as%20%600001-01-01%60%20instead%20%5B%5B8%5D%5D%28https%3A//github.com/apache/incubator-devlake/issues/8516%29.) [](https://go.dosu.dev/discord-bot?utm_source=github&utm_medium=bot-comment&utm_campaign=github-assistant&utm_content=join-discord) [](https://twitter.com/intent/tweet?text=%40dosu_ai%20helped%20me%20solve%20this%20issue!&url=https%3A//github.com/apache/incubator-devlake/issues/8818) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
