Nickcw6 opened a new issue, #7750: URL: https://github.com/apache/incubator-devlake/issues/7750
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and found no similar issues. ### What happened When running a data collection for a CircleCI connection, data only appears to be collected from the past <24 hours, irrespective of what `Time Range` is set to. Same behaviour observed in 'full refresh mode' & normal data collection. Seemed to have slightly differing behaviour each time I tried - when [originally raised on Slack](https://devlake-io.slack.com/archives/C03APJ20VM4/p1720800396443109) only the last ~3 hours of data was collected, however when reproducing again to raise this issue, seems to now have data from the past ~24 hours. E.g. time frequency set to start of the year, then checking the `_tool_circleci_workflow` table:   Only 18 workflows are identified, the earliest of which occurring at `2024-07-15 10:29:09.000`. I would expect to see many more rows dating back to `2024-01-01`. CircleCI pipeline task logs: ```time="2024-07-16 09:34:23" level=info msg=" [pipeline service] [pipeline #12] [task #99] start executing task: 99" time="2024-07-16 09:34:23" level=info msg=" [pipeline service] [pipeline #12] [task #99] start plugin" time="2024-07-16 09:34:23" level=info msg=" [pipeline service] [pipeline #12] [task #99] [api async client] creating scheduler for api \"https://circleci.com/api/\", number of workers: 13, 10000 reqs / 1h0m0s (interval: 360ms)" time="2024-07-16 09:34:23" level=info msg=" [pipeline service] [pipeline #12] [task #99] total step: 9" time="2024-07-16 09:34:23" level=info msg=" [pipeline service] [pipeline #12] [task #99] executing subtask convertProjects" time="2024-07-16 09:34:23" level=info msg=" [pipeline service] [pipeline #12] [task #99] [convertProjects] finished records: 1" time="2024-07-16 09:34:23" level=info msg=" [pipeline service] [pipeline #12] [task #99] finished step: 1 / 9" time="2024-07-16 09:34:23" level=info msg=" [pipeline service] [pipeline #12] [task #99] executing subtask collectPipelines" time="2024-07-16 09:34:23" level=info msg=" [pipeline service] [pipeline #12] [task #99] [collectPipelines] collect pipelines" time="2024-07-16 09:34:23" level=info msg=" [pipeline service] [pipeline #12] [task #99] [collectPipelines] start api collection" time="2024-07-16 09:34:25" level=info msg=" [pipeline service] [pipeline #12] [task #99] [collectPipelines] finished records: 1" time="2024-07-16 09:34:25" level=info msg=" [pipeline service] [pipeline #12] [task #99] [collectPipelines] end api collection without error" time="2024-07-16 09:34:25" level=info msg=" [pipeline service] [pipeline #12] [task #99] finished step: 2 / 9" time="2024-07-16 09:34:25" level=info msg=" [pipeline service] [pipeline #12] [task #99] executing subtask extractPipelines" time="2024-07-16 09:34:25" level=info msg=" [pipeline service] [pipeline #12] [task #99] [extractPipelines] get data from _raw_circleci_api_pipelines where params={\"ConnectionId\":1,\"ProjectSlug\":\"gh/SylveraIO/web-app-mono\"} and got 20" time="2024-07-16 09:34:25" level=info msg=" [pipeline service] [pipeline #12] [task #99] [extractPipelines] finished records: 1" time="2024-07-16 09:34:25" level=info msg=" [pipeline service] [pipeline #12] [task #99] finished step: 3 / 9" time="2024-07-16 09:34:25" level=info msg=" [pipeline service] [pipeline #12] [task #99] executing subtask collectWorkflows" time="2024-07-16 09:34:25" level=info msg=" [pipeline service] [pipeline #12] [task #99] [collectWorkflows] collect workflows" time="2024-07-16 09:34:25" level=info msg=" [pipeline service] [pipeline #12] [task #99] [collectWorkflows] start api collection" time="2024-07-16 09:34:25" level=info msg=" [pipeline service] [pipeline #12] [task #99] [collectWorkflows] finished records: 1" time="2024-07-16 09:34:28" level=info msg=" [pipeline service] [pipeline #12] [task #99] [collectWorkflows] finished records: 10" time="2024-07-16 09:34:31" level=info msg=" [pipeline service] [pipeline #12] [task #99] [collectWorkflows] finished records: 19" time="2024-07-16 09:34:32" level=info msg=" [pipeline service] [pipeline #12] [task #99] [collectWorkflows] end api collection without error" time="2024-07-16 09:34:32" level=info msg=" [pipeline service] [pipeline #12] [task #99] finished step: 4 / 9" time="2024-07-16 09:34:32" level=info msg=" [pipeline service] [pipeline #12] [task #99] executing subtask extractWorkflows" time="2024-07-16 09:34:32" level=info msg=" [pipeline service] [pipeline #12] [task #99] [extractWorkflows] get data from _raw_circleci_api_workflows where params={\"ConnectionId\":1,\"ProjectSlug\":\"gh/SylveraIO/web-app-mono\"} and got 18" time="2024-07-16 09:34:32" level=info msg=" [pipeline service] [pipeline #12] [task #99] [extractWorkflows] finished records: 1" time="2024-07-16 09:34:32" level=info msg=" [pipeline service] [pipeline #12] [task #99] finished step: 5 / 9" time="2024-07-16 09:34:32" level=info msg=" [pipeline service] [pipeline #12] [task #99] executing subtask collectJobs" time="2024-07-16 09:34:32" level=info msg=" [pipeline service] [pipeline #12] [task #99] [collectJobs] collect jobs" time="2024-07-16 09:34:32" level=info msg=" [pipeline service] [pipeline #12] [task #99] [collectJobs] start api collection" time="2024-07-16 09:34:32" level=info msg=" [pipeline service] [pipeline #12] [task #99] [collectJobs] finished records: 1" time="2024-07-16 09:34:35" level=info msg=" [pipeline service] [pipeline #12] [task #99] [collectJobs] finished records: 10" time="2024-07-16 09:34:38" level=info msg=" [pipeline service] [pipeline #12] [task #99] [collectJobs] end api collection without error" time="2024-07-16 09:34:38" level=info msg=" [pipeline service] [pipeline #12] [task #99] finished step: 6 / 9" time="2024-07-16 09:34:38" level=info msg=" [pipeline service] [pipeline #12] [task #99] executing subtask extractJobs" time="2024-07-16 09:34:38" level=info msg=" [pipeline service] [pipeline #12] [task #99] [extractJobs] get data from _raw_circleci_api_jobs where params={\"ConnectionId\":1,\"ProjectSlug\":\"gh/SylveraIO/web-app-mono\"} and got 162" time="2024-07-16 09:34:38" level=info msg=" [pipeline service] [pipeline #12] [task #99] [extractJobs] finished records: 1" time="2024-07-16 09:34:38" level=info msg=" [pipeline service] [pipeline #12] [task #99] finished step: 7 / 9" time="2024-07-16 09:34:38" level=info msg=" [pipeline service] [pipeline #12] [task #99] executing subtask convertJobs" time="2024-07-16 09:34:38" level=info msg=" [pipeline service] [pipeline #12] [task #99] [convertJobs] finished records: 1" time="2024-07-16 09:34:39" level=info msg=" [pipeline service] [pipeline #12] [task #99] finished step: 8 / 9" time="2024-07-16 09:34:39" level=info msg=" [pipeline service] [pipeline #12] [task #99] executing subtask convertWorkflows" time="2024-07-16 09:34:39" level=info msg=" [pipeline service] [pipeline #12] [task #99] [convertWorkflows] finished records: 1" time="2024-07-16 09:34:39" level=info msg=" [pipeline service] [pipeline #12] [task #99] finished step: 9 / 9" ``` Also have Github and Jira data connections running within the same pipeline, and data is pulled through as expected for both of these plugins. ### What do you expect to happen Data is collected from the full specified time range, e.g. starting from `2024-01-01` (or whenever specified). ### How to reproduce 1. Configure a CircleCI connection using the plugin 2. Associate this to a project 3. Set a time range (or leave as default for 6 months) 4. Run a data collection (either normally in or full refresh) 5. Check the `_tools_circleci_workflows`, `_tools_circleci_pipelines` or `_tools_circleci_jobs` tables for expected row count, and earliest `started_at` or `created_at` timestamp (see below) ### Anything else As an aside (but potentially related) - I notice there are discrepancies between the column names across the three CircleCI tool tables, e.g. - On `_tools_circleci_workflows` - `created_at` is the timestamp the workflow was triggered in CircleCI. There is no other column which could represent the start of the workflow in CircleCI. - On `_tools_circleci_jobs` - `created_at` is the timestamp the row was created in the DevLake DB, and `started_at` is the CircleCI timestamp. - On `_tools_circleci_pipelines` - `created_at` is again the timestamp of DevLake DB creation. There is `created_date`, but this always seems to be `NULL`. As with the workflows table, there doesn't appear to be any column which represents the starting timestamp in CircleCI. ### Version v1.0.0 ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@devlake.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org