Nickcw6 opened a new issue, #7750:
URL: https://github.com/apache/incubator-devlake/issues/7750

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What happened
   
   When running a data collection for a CircleCI connection, data only appears 
to be collected from the past <24 hours, irrespective of what `Time Range` is 
set to. Same behaviour observed in 'full refresh mode' & normal data collection.
   
   Seemed to have slightly differing behaviour each time I tried - when 
[originally raised on 
Slack](https://devlake-io.slack.com/archives/C03APJ20VM4/p1720800396443109) 
only the last ~3 hours of data was collected, however when reproducing again to 
raise this issue, seems to now have data from the past ~24 hours.
   
   E.g. time frequency set to start of the year, then checking the 
`_tool_circleci_workflow` table:
   
   ![Screenshot 2024-07-16 at 10 34 
16](https://github.com/user-attachments/assets/97298c9e-ad62-4035-b306-b67ae9ad676c)
   ![Screenshot 2024-07-16 at 11 36 
54](https://github.com/user-attachments/assets/17ee063d-622a-47fc-93e0-cc0a4b4e5d65)
   
   Only 18 workflows are identified, the earliest of which occurring at 
`2024-07-15 10:29:09.000`. I would expect to see many more rows dating back to 
`2024-01-01`.  
   
   CircleCI pipeline task logs:
   
   ```time="2024-07-16 09:34:23" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] start executing task: 99"
   time="2024-07-16 09:34:23" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] start plugin"
   time="2024-07-16 09:34:23" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] [api async client] creating scheduler for api 
\"https://circleci.com/api/\";, number of workers: 13, 10000 reqs / 1h0m0s 
(interval: 360ms)"
   time="2024-07-16 09:34:23" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] total step: 9"
   time="2024-07-16 09:34:23" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] executing subtask convertProjects"
   time="2024-07-16 09:34:23" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] [convertProjects] finished records: 1"
   time="2024-07-16 09:34:23" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] finished step: 1 / 9"
   time="2024-07-16 09:34:23" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] executing subtask collectPipelines"
   time="2024-07-16 09:34:23" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] [collectPipelines] collect pipelines"
   time="2024-07-16 09:34:23" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] [collectPipelines] start api collection"
   time="2024-07-16 09:34:25" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] [collectPipelines] finished records: 1"
   time="2024-07-16 09:34:25" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] [collectPipelines] end api collection without error"
   time="2024-07-16 09:34:25" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] finished step: 2 / 9"
   time="2024-07-16 09:34:25" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] executing subtask extractPipelines"
   time="2024-07-16 09:34:25" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] [extractPipelines] get data from _raw_circleci_api_pipelines 
where params={\"ConnectionId\":1,\"ProjectSlug\":\"gh/SylveraIO/web-app-mono\"} 
and got 20"
   time="2024-07-16 09:34:25" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] [extractPipelines] finished records: 1"
   time="2024-07-16 09:34:25" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] finished step: 3 / 9"
   time="2024-07-16 09:34:25" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] executing subtask collectWorkflows"
   time="2024-07-16 09:34:25" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] [collectWorkflows] collect workflows"
   time="2024-07-16 09:34:25" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] [collectWorkflows] start api collection"
   time="2024-07-16 09:34:25" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] [collectWorkflows] finished records: 1"
   time="2024-07-16 09:34:28" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] [collectWorkflows] finished records: 10"
   time="2024-07-16 09:34:31" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] [collectWorkflows] finished records: 19"
   time="2024-07-16 09:34:32" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] [collectWorkflows] end api collection without error"
   time="2024-07-16 09:34:32" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] finished step: 4 / 9"
   time="2024-07-16 09:34:32" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] executing subtask extractWorkflows"
   time="2024-07-16 09:34:32" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] [extractWorkflows] get data from _raw_circleci_api_workflows 
where params={\"ConnectionId\":1,\"ProjectSlug\":\"gh/SylveraIO/web-app-mono\"} 
and got 18"
   time="2024-07-16 09:34:32" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] [extractWorkflows] finished records: 1"
   time="2024-07-16 09:34:32" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] finished step: 5 / 9"
   time="2024-07-16 09:34:32" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] executing subtask collectJobs"
   time="2024-07-16 09:34:32" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] [collectJobs] collect jobs"
   time="2024-07-16 09:34:32" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] [collectJobs] start api collection"
   time="2024-07-16 09:34:32" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] [collectJobs] finished records: 1"
   time="2024-07-16 09:34:35" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] [collectJobs] finished records: 10"
   time="2024-07-16 09:34:38" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] [collectJobs] end api collection without error"
   time="2024-07-16 09:34:38" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] finished step: 6 / 9"
   time="2024-07-16 09:34:38" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] executing subtask extractJobs"
   time="2024-07-16 09:34:38" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] [extractJobs] get data from _raw_circleci_api_jobs where 
params={\"ConnectionId\":1,\"ProjectSlug\":\"gh/SylveraIO/web-app-mono\"} and 
got 162"
   time="2024-07-16 09:34:38" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] [extractJobs] finished records: 1"
   time="2024-07-16 09:34:38" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] finished step: 7 / 9"
   time="2024-07-16 09:34:38" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] executing subtask convertJobs"
   time="2024-07-16 09:34:38" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] [convertJobs] finished records: 1"
   time="2024-07-16 09:34:39" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] finished step: 8 / 9"
   time="2024-07-16 09:34:39" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] executing subtask convertWorkflows"
   time="2024-07-16 09:34:39" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] [convertWorkflows] finished records: 1"
   time="2024-07-16 09:34:39" level=info msg=" [pipeline service] [pipeline 
#12] [task #99] finished step: 9 / 9"
   ```
   
   Also have Github and Jira data connections running within the same pipeline, 
and data is pulled through as expected for both of these plugins. 
   
   ### What do you expect to happen
   
   Data is collected from the full specified time range, e.g. starting from 
`2024-01-01` (or whenever specified). 
   
   ### How to reproduce
   
   1. Configure a CircleCI connection using the plugin
   2. Associate this to a project
   3. Set a time range (or leave as default for 6 months)
   4. Run a data collection (either normally in or full refresh)
   5. Check the `_tools_circleci_workflows`, `_tools_circleci_pipelines` or 
`_tools_circleci_jobs` tables for expected row count, and earliest `started_at` 
or `created_at` timestamp (see below)
   
   ### Anything else
   
   As an aside (but potentially related) - I notice there are discrepancies 
between the column names across the three CircleCI tool tables, e.g. 
   
   - On `_tools_circleci_workflows` - `created_at` is the timestamp the 
workflow was triggered in CircleCI. There is no other column which could 
represent the start of the workflow in CircleCI.
   - On `_tools_circleci_jobs` - `created_at` is the timestamp the row was 
created in the DevLake DB, and `started_at` is the CircleCI timestamp.
   - On `_tools_circleci_pipelines` - `created_at` is again the timestamp  of 
DevLake DB creation. There is `created_date`, but this always seems to be 
`NULL`. As with the workflows table, there doesn't appear to be any column 
which represents the starting timestamp in CircleCI.
   
   ### Version
   
   v1.0.0
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@devlake.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to