(devlake) branch main updated: fix: Copilot plugin - add backfill time window to ensure data consistency (#8811)

eldrick Fri, 05 Jun 2026 07:51:22 -0700

This is an automated email from the ASF dual-hosted git repository.

ewega pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/devlake.git



The following commit(s) were added to refs/heads/main by this push:
     new 8a3af1f97 fix: Copilot plugin - add backfill time window to ensure 
data consistency (#8811)
8a3af1f97 is described below

commit 8a3af1f97f47c40b244e9fb374256cd4224ec090
Author: Tamás Laczkó-Albert <[email protected]>
AuthorDate: Fri Jun 5 17:51:08 2026 +0300

    fix: Copilot plugin - add backfill time window to ensure data consistency 
(#8811)
    
    * fix: Add backfill time window to ensure data consistency
    
    chore: Add backfill time window to ensure data consistency
    (cherry picked from commit 52a9555b782860f9c8ba9db409bd56e0c8f58272)
    
    * fix(q_dev): prevent data duplication in user_report and user_data tables 
(#8737)
    
    * fix(q_dev): prevent data duplication in user_report and user_data tables
    
    Replace auto-increment ID with composite primary keys so that
    CreateOrUpdate can properly deduplicate rows on re-extraction.
    
    - user_report PK: (connection_id, scope_id, user_id, date, client_type)
    - user_data PK: (connection_id, scope_id, user_id, date)
    - Switch db.Create() to db.CreateOrUpdate() in s3_data_extractor
    - Migration drops old tables, rebuilds with new PKs, resets s3_file_meta
      processed flag to trigger re-extraction
    
    * fix(q_dev): gofmt archived user_data_v2 model
    
    * feat(github): Extend exclusion of file extensions to github plugin (#8719)
    
    * feat(github): extend PR size exclusion for specified file extension to 
github plugin
    
    * fix: register migration script
    
    * fix: move PR size to 'Additional settings' and change so the comma 
doesn't get removed while typing
    
    * fix: linting
    
    * fix(doc): update expired Slack invite links in README (#8739)
    
    The Slack invite links in README.md were expired and returning
    "This link is no longer active." Updated both occurrences (badge
    and community section) to match the current link on the official
    DevLake website.
    
    Closes #8738
    
    Co-authored-by: Spiff Azeta <[email protected]>
    
    * docs: add gh-devlake CLI to Getting Started installation options (#8733)
    
    Adds gh-devlake as a third installation method alongside Docker Compose
    and Helm. gh-devlake is a GitHub CLI extension that automates DevLake
    deployment, configuration, and monitoring from the terminal.
    
    Closes #8732
    
    * fix(gitlab): add missing repos scope in project_mapping (#8743)
    
    GitLab's makeScopeV200 did not create a repos scope when
    scopeConfig.Entities was empty or only contained CROSS. This
    caused project_mapping to have no table='repos' row, breaking
    downstream DORA metrics, PR-issue linking, and all PR dashboard
    panels that join on project_mapping.
    
    The fix aligns GitLab with the GitHub plugin by:
    1. Defaulting empty entities to plugin.DOMAIN_TYPES
    2. Adding DOMAIN_TYPE_CROSS to the repo scope condition
    
    Closes #8742
    
    Co-authored-by: Spiff Azeta <[email protected]>
    
    * fix(grafana): update dashboard descriptions to list all supported data 
sources (#8741)
    
    Several dashboard introduction panels hardcoded "GitHub and Jira" as
    required data sources, even though the underlying queries use generic
    domain layer tables that work with any supported Git tool or issue
    tracker. Updated to list all supported sources following the pattern
    already used by DORA and WorkLogs dashboards.
    
    Closes #8740
    
    Co-authored-by: Spiff Azeta <[email protected]>
    
    * fix: modify cicd_deployments name from varchar to text (#8724)
    
    * fix: modify cicd_deployments name from varchar to text
    
    * fix: update the year
    
    * fix(q_dev): replace MariaDB-specific IF NOT EXISTS syntax with DAL 
methods for MySQL 8.x compatibility (#8745)
    
    * fix(azuredevops): default empty entities and add CROSS to repo scope in 
makeScopeV200 (#8751)
    
    When scopeConfig.Entities is empty (common when no entities are
    explicitly selected in the UI), makeScopeV200 produced zero scopes,
    leaving project_mapping with no rows. Additionally, the repo scope
    condition did not check for DOMAIN_TYPE_CROSS, so selecting only
    CROSS would not create a repo scope, breaking DORA metrics.
    
    This adds the same fixes applied to GitLab in #8743.
    
    Closes #8749
    
    * fix(bitbucket): default empty entities to all domain types in 
makeScopesV200 (#8750)
    
    When scopeConfig.Entities is empty (common when no entities are
    explicitly selected in the UI), makeScopesV200 produced zero scopes,
    leaving project_mapping with no repo rows. This adds the same
    empty-entities default applied to GitLab in #8743.
    
    Closes #8748
    
    * feat(circleci): add server version requirement and endpoint help text 
(#8757)
    
    Update CircleCI connection form to indicate Server v4.x+ requirement
    and provide guidance for server endpoint configuration.
    
    Signed-off-by: Joshua Smith <[email protected]>
    
    * feat(asana): add Asana plugin for project and task collection (#8758)
    
    Add a new Asana plugin that integrates with Asana's REST API to collect
    projects, sections, tasks, subtasks, stories (comments), tags, and users,
    mapping them to DevLake's ticket/board domain model.
    
    Backend:
    - Plugin implementation with all required interfaces (PluginMeta,
      PluginTask, PluginModel, PluginMigration, PluginSource, PluginApi,
      DataSourcePluginBlueprintV200)
    - Collectors, extractors, and converters for projects, sections, tasks,
      subtasks, stories, tags, and users
    - Remote API scope picker (Workspaces -> Teams/Portfolios -> Projects)
    - Scope config with issue-type regex transformation rules
    - Migration scripts for schema evolution
    - E2E tests with CSV fixtures for project and task data flows
    
    Config UI:
    - Plugin registration with connection form (PAT auth, endpoint, proxy)
    - Scope config transformation form for issue-type mapping
    - Dashboard URL integration for onboarding flow
    
    Grafana:
    - Asana dashboard with task metrics and visualizations
    
    Made-with: Cursor
    
    * feat: GitHub App token refresh (#8746)
    
    * feat(github): auto-refresh GitHub App installation tokens
    
    Add transport-level token refresh for GitHub App (AppKey) connections.
    GitHub App installation tokens expire after ~1 hour; this adds proactive
    refresh (before expiry) and reactive refresh (on 401) using the existing
    TokenProvider/RefreshRoundTripper infrastructure.
    
    New files:
    - app_installation_refresh.go: refresh logic + DB persistence
    - refresh_api_client.go: minimal ApiClient for token refresh POST
    - cmd/test_refresh/main.go: manual test script for real GitHub Apps
    
    Modified:
    - connection.go: export GetInstallationAccessToken, parse ExpiresAt
    - token_provider.go: add refreshFn for pluggable refresh strategies
    - round_tripper.go: document dual Authorization header interaction
    - api_client.go: wire AppKey connections into refresh infrastructure
    - Tests updated for new constructors and AppKey refresh flow
    
    * feat(github): add diagnostic logging to GitHub App token refresh
    
    Add structured logging at key decision points for token refresh:
    - Token provider creation (connection ID, installation ID, expiry)
    - Round tripper installation (connection ID, auth method)
    - Proactive refresh trigger (near-expiry detection)
    - Refresh start/success/failure (old/new token prefixes, expiry times)
    - DB persistence success/failure
    - Reactive 401 refresh and skip-due-to-concurrent-refresh
    
    All logs route through the DevLake logger to pipeline log files.
    
    * fix(github): prevent deadlock and fix token persistence in App token 
refresh
    
    Deadlock fix: NewAppInstallationTokenProvider now captures client.Transport
    (the base transport) before wrapping with RefreshRoundTripper. The refresh
    function uses newRefreshApiClientWithTransport(baseTransport) to POST for
    new installation tokens, bypassing the RefreshRoundTripper entirely.
    
    Token persistence fix: PersistEncryptedTokenColumns() manually encrypts
    tokens via plugin.Encrypt() then writes ciphertext via dal.UpdateColumns
    with conn.TableName() (a string) as the first argument. Passing the table
    name string makes GORM use Table() instead of Model(), preventing the
    encdec serializer from corrupting the in-memory token value. The encryption
    secret is threaded from taskCtx.GetConfig(ENCRYPTION_SECRET) through
    CreateApiClient to TokenProvider to persist functions.
    
    Also persists the initial App token at startup for DB consistency, and
    adds TestProactiveRefreshNoDeadlock with a real RSA key to verify the
    deadlock scenario is resolved.
    
    * fix(grafana): update dashboard descriptions to list all supported data 
sources (#8741)
    
    Several dashboard introduction panels hardcoded "GitHub and Jira" as
    required data sources, even though the underlying queries use generic
    domain layer tables that work with any supported Git tool or issue
    tracker. Updated to list all supported sources following the pattern
    already used by DORA and WorkLogs dashboards.
    
    Closes #8740
    
    Co-authored-by: Spiff Azeta <[email protected]>
    
    * fix: modify cicd_deployments name from varchar to text (#8724)
    
    * fix: modify cicd_deployments name from varchar to text
    
    * fix: update the year
    
    * fix(q_dev): replace MariaDB-specific IF NOT EXISTS syntax with DAL 
methods for MySQL 8.x compatibility (#8745)
    
    * fix(azuredevops): default empty entities and add CROSS to repo scope in 
makeScopeV200 (#8751)
    
    When scopeConfig.Entities is empty (common when no entities are
    explicitly selected in the UI), makeScopeV200 produced zero scopes,
    leaving project_mapping with no rows. Additionally, the repo scope
    condition did not check for DOMAIN_TYPE_CROSS, so selecting only
    CROSS would not create a repo scope, breaking DORA metrics.
    
    This adds the same fixes applied to GitLab in #8743.
    
    Closes #8749
    
    * fix(bitbucket): default empty entities to all domain types in 
makeScopesV200 (#8750)
    
    When scopeConfig.Entities is empty (common when no entities are
    explicitly selected in the UI), makeScopesV200 produced zero scopes,
    leaving project_mapping with no repo rows. This adds the same
    empty-entities default applied to GitLab in #8743.
    
    Closes #8748
    
    * fix(github): remove unused refresh client constructor and update tests
    
    ---------
    
    Co-authored-by: Spiff Azeta <[email protected]>
    Co-authored-by: Spiff Azeta <[email protected]>
    Co-authored-by: Dan Crews <[email protected]>
    Co-authored-by: Tomoya Kawaguchi 
<[email protected]>
    
    * fix: cwe89 sql injection (#8762)
    
    * feat(q-dev): add logging data ingestion and enrich Kiro dashboards (#8767)
    
    * feat(q-dev): add logging data ingestion and enrich Kiro dashboards
    
    Add support for ingesting S3 logging data (GenerateAssistantResponse and
    GenerateCompletions events) into new database tables, and enrich all three
    Kiro Grafana dashboards with additional metrics.
    
    Changes:
    - New models: QDevChatLog and QDevCompletionLog for logging event data
    - New extractor: s3_logging_extractor.go parses JSON.gz logging files
    - Updated S3 collector to also handle .json.gz files
    - Added logging S3 prefixes (GenerateAssistantResponse, GenerateCompletions)
    - New dashboard: "Kiro AI Activity Insights" with 10 panels including
      model usage distribution, active hours, conversation depth, feature
      adoption (Steering/Spec), file type usage, and prompt/response trends
    - Enriched "Kiro Code Metrics Dashboard" with DocGeneration, TestGeneration,
      and Dev (Agentic) metric panels
    - Fixed "Kiro Usage Dashboard" per-user table to sort by user_id
    - Migration script for new tables
    
    * fix(q-dev): use separate base path for logging S3 prefixes
    
    Logging data lives under a different S3 prefix ("logging/") than user
    report data ("user-report/"). Add LoggingBasePath option (defaults to
    "logging") so logging prefixes are constructed correctly.
    
    * fix(q-dev): auto-scan logging path without extra config
    
    Kiro exports to two well-known S3 prefixes in the same bucket:
    - user-report/AWSLogs/{accountId}/KiroLogs/ (CSV reports)
    - logging/AWSLogs/{accountId}/KiroLogs/ (interaction logs)
    
    When AccountId is set, automatically scan both paths. The "logging"
    prefix is hardcoded since it's a standard Kiro export convention.
    No additional configuration needed.
    
    * fix(q-dev): update scope tooltip to mention logging data scanning
    
    * fix(q-dev): fix scope ID routing and CSV/JSON file separation
    
    Three fixes:
    1. Use *scopeId (catch-all) route pattern instead of :scopeId so scope
       IDs containing "/" (e.g. "034362076319/2026") work in URL paths
    2. CSV extractor now filters for .csv files only, preventing it from
       trying to parse .json.gz logging files as CSV
    3. Frontend scope API calls now encodeURIComponent(scopeId) for safe
       URL encoding
    
    * fix(q-dev): resolve *scopeId route conflict with dispatcher pattern
    
    The catch-all *scopeId route conflicts with *scopeId/latest-sync-state.
    Follow Jenkins/Bitbucket pattern: use a single *scopeId route with a
    GetScopeDispatcher that checks for /latest-sync-state suffix and
    dispatches accordingly. All scope handlers now TrimLeft "/" from scopeId.
    
    * fix(q-dev): use URL-safe scope ID format (underscore separator)
    
    Scope IDs like "034362076319/2026" break URL routing because "/" is a
    path separator. Change ID format to "034362076319_2026" (underscore)
    when AccountId is set. The Prefix field still uses "/" for S3 path
    matching. Revert to standard :scopeId routes since IDs are now safe.
    
    Note: existing scopes need to be recreated after this change.
    
    * fix(q-dev): use NoPKModel instead of Model in archived logging models
    
    archived.Model only has ID+timestamps, missing RawDataOrigin fields
    (_raw_data_params etc.) that common.NoPKModel includes. This caused
    "Unknown column '_raw_data_params'" errors at runtime.
    
    * fix(q-dev): fix GROUP BY in per-user table to merge display_name variants
    
    Remove display_name from GROUP BY so same user_id with different
    display_name values gets merged. Use MAX(display_name) in SELECT.
    
    * fix(q-dev): normalize logging user IDs to match CSV short UUID format
    
    Logging data uses "d-{directoryId}.{UUID}" format while CSV user-report
    uses plain "{UUID}". Strip the "d-xxx." prefix so the same user maps to
    one user_id across both data sources.
    
    * fix(q-dev): normalize user IDs in CSV extractors and sort table DESC
    
    Apply normalizeUserId to both createUserReportData and
    createUserDataWithDisplayName so user_report CSV data also strips
    the "d-{directoryId}." prefix. Change per-user table sort to
    ORDER BY user_id DESC.
    
    * style(q-dev): fix gofmt formatting in chat_log models
    
    * perf(q-dev): parallelize logging S3 downloads and batch DB writes
    
    Optimize logging extractor performance:
    - 10 goroutine workers for parallel S3 file downloads
    - Batch 50 files per DB transaction instead of 1-per-file
    - sync.Map cache for display name resolution (avoid repeated IAM calls)
    - Parse records in memory during download, write all at once
    
    This should improve throughput from ~1.5 files/sec to ~15+ files/sec
    for typical logging file sizes.
    
    * fix(q-dev): check tx.Rollback error return to satisfy errcheck lint
    
    * feat(q-dev): add per-user model usage table and models column
    
    Add "Per-User Model Usage" table (panel 11) showing each user's
    request count and avg prompt/response length per model_id. Also add
    "Models Used" column to the Per-User Activity table.
    
    * fix(q-dev): remove per-user model usage table, keep models column only
    
    * feat(q-dev): add Kiro Executive Dashboard with cross-source analytics
    
    New dashboard "Kiro Executive Dashboard" with 12 panels covering:
    - KPIs: WAU, credits efficiency, acceptance rate, steering adoption
    - Trends: weekly active users, new vs returning users
    - Adoption funnel: 
Chat→Inline→CodeFix→Review→DocGen→TestGen→Agentic→Steering→Spec
    - Cost: credits pace vs projected monthly, idle power users
    - Quality: acceptance rate trends, code review findings, test generation
    - Efficiency: per-user productivity table with credits/line ratio
    
    Correlates data across user_report (credits), user_data (code metrics),
    and chat_log (interaction patterns) for holistic Kiro usage insights.
    
    * fix(q-dev): fix pie charts to show per-row slices instead of single total
    
    Set reduceOptions.values=true so Grafana treats each SQL result row as
    a separate pie slice. Fixes Model Usage Distribution, File Type Usage,
    Kiro Feature Adoption, and Active File Types pie charts.
    
    * fix(q-dev): cast Hour to string for Active Hours bar chart x-axis
    
    * fix(q-dev): fix pie chart single-slice and GROUP BY display_name issues
    
    1. qdev_user_report Panel 4 (Subscription Tier Distribution): set
       reduceOptions.values=true to show per-tier slices
    2. qdev_user_data Panel 6 (User Interactions): remove display_name
       from GROUP BY, use MAX(display_name) to merge same user
    
    * fix(q-dev): prevent data inflation in user_report JOIN user_data
    
    user_report has multiple rows per (user_id, date) due to client_type
    (KIRO_IDE, KIRO_CLI), but user_data has only one row per (user_id, date).
    A direct JOIN causes user_data metrics to be counted multiple times.
    
    Fix: pre-aggregate user_report by (user_id, date) in a subquery before
    joining, so the JOIN is always 1:1.
    
    Affects: Credits Efficiency stat and User Productivity table.
    
    * feat(qa): add is_invalid field to qa_test_case_executions (#8764)
    
    * feat(qa): add is_invalid field to qa_test_case_executions
    
    Add is_invalid boolean field to the domain layer qa_test_case_executions
    table to allow QA teams to flag test executions as invalid due to
    environmental issues, flaky tests, false positives, or false negatives.
    
    Changes:
    - Add IsInvalid field to QaTestCaseExecution domain model
    - Create migration script 
(20260313_add_is_invalid_to_qa_test_case_executions)
    - Register migration in migrationscripts/register.go
    - Update customize service to set default value for is_invalid
    - Update E2E test data to include new column
    
    Resolves #8763
    
    Co-Authored-By: Claude Opus 4.6 <[email protected]>
    
    * fix(qa): handle missing is_invalid column in CSV import
    
    Fix PostgreSQL compatibility issue when CSV files don't contain
    the is_invalid column. The field now defaults to false instead
    of an empty string.
    
    Changes:
    - Update qaTestCaseExecutionHandler to check for empty string values
    - Add E2E test for backward compatibility with CSV files lacking is_invalid
    - Add explicit IsInvalid initialization in Testmo plugin converter
    
    Resolves #8763
    
    ---------
    
    Co-authored-by: Claude Opus 4.6 <[email protected]>
    
    * feat(linker): link when branch names contain issue keys (#8777)
    
    * feat(linker): branch names containing issue keys
    
    * chore: add testing data
    
    * Add codespell support with configuration and fixes (#8761)
    
    * ci(codespell): add codespell config and GitHub Actions workflow
    
    Add .codespellrc with skip patterns for generated files, 
camelCase/PascalCase
    ignore-regex, and project-specific word list (convertor, crypted, te, thur).
    Add GitHub Actions workflow to run codespell on push to main and PRs.
    
    Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <[email protected]>
    Signed-off-by: Yaroslav Halchenko <[email protected]>
    
    * fix(codespell): fix ambiguous typos requiring context review
    
    Manual fixes for typos that needed human review to avoid breaking code:
    - Comment/string typos: occured->occurred, destory->destroy, 
writting->writing,
      retreive->retrieve, identifer->identifier, etc.
    - Struct field comments and documentation corrections
    - Migration script comment fixes (preserving Go identifiers like 
DataConvertor)
    
    Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <[email protected]>
    Signed-off-by: Yaroslav Halchenko <[email protected]>
    
    * fix(codespell): fix non-ambiguous typos with codespell -w
    
    Automated fix via `codespell -w` for clear-cut typos across backend, 
config-ui,
    and grafana dashboards. Examples: sucess->success, occurence->occurrence,
    exeucte->execute, asynchornous->asynchronous, Grafana panel typos, etc.
    
    Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <[email protected]>
    Signed-off-by: Yaroslav Halchenko <[email protected]>
    
    ---------
    
    Signed-off-by: Yaroslav Halchenko <[email protected]>
    Co-authored-by: Claude Code 2.1.63 / Claude Opus 4.6 <[email protected]>
    
    * feat(q-dev): enrich logging fields, separate dashboards, add E2E tests 
(#8786)
    
    * feat(q-dev): enrich logging fields, separate dashboards by data source, 
add E2E tests
    
    - Add new fields to chat_log: CodeReferenceCount, WebLinkCount, 
HasFollowupPrompts
      (from codeReferenceEvents, supplementaryWebLinksEvent, followupPrompts in 
JSON)
    - Add new fields to completion_log: LeftContextLength, RightContextLength
      (from leftContext/rightContext in JSON)
    - Update s3_logging_extractor to parse and populate new fields
    - Add migration script 20260319_add_logging_fields
    - Create qdev_feature_metrics dashboard for legacy by_user_analytic data
    - Reorganize qdev_executive dashboard with Row dividers labeling data 
sources
      and cross-dashboard navigation links
    - Enrich qdev_logging dashboard with new panels:
      Chat Trigger Type Distribution, Response Enrichment Breakdown,
      Completion Context Size Trends, Response Enrichment Trends
    - Fix SQL compatibility with only_full_group_by mode in executive dashboard
      (Weekly Active Users Trend, New vs Returning Users)
    - Fix Steering Adoption stat panel returning string instead of numeric value
    - Add Playwright E2E test covering full pipeline flow and dashboard 
verification
    
    * fix: add Apache license headers to e2e files, fix gofmt alignment
    
    * fix: add SQL identifier validation to prevent SQL injection via 
table/column names (#8769)
    
    Add ValidateTableName and ValidateColumnName functions in core/dal to ensure
    table and column names used in dynamic SQL are safe identifiers. Applied to
    scope_service_helper, scope_generic_helper, and customized_fields_extractor.
    
    * feat(q-dev): add Kiro Credits + DORA Correlation dashboard (#8792)
    
    Add a new Grafana dashboard that correlates Kiro AI usage (credits,
    messages, active users) with DORA metrics at weekly aggregate level.
    
    Panels include:
    - Pearson's r correlation between weekly credits and PR cycle time
    - High AI Usage vs Low AI Usage cycle time comparison
    - Weekly credits vs deployment frequency trend
    - Weekly credits vs change failure rate trend
    
    Data is joined by week_start between _tool_q_dev_user_report and
    project_pr_metrics / cicd_deployment_commits.
    
    * feat(q-dev): add AI Cost-Efficiency dashboard (#8793)
    
    Add a Grafana dashboard showing AI tool cost-efficiency metrics:
    - Credits per merged PR (overall + weekly trend)
    - Credits per production deployment (overall + weekly trend)
    - Credits per issue resolved (overall + weekly trend)
    - Weekly AI activity volume (credits, messages, conversations)
    
    Joins _tool_q_dev_user_report with pull_requests,
    cicd_deployment_commits, and issues by weekly aggregation.
    
    * feat(q-dev): add Multi-AI Tool Comparison dashboard (Copilot vs Kiro) 
(#8794)
    
    Add a Grafana dashboard comparing GitHub Copilot and Kiro side by side:
    - Weekly active users comparison
    - Code suggestions & acceptance events (per tool)
    - LOC accepted comparison (combined time series)
    - Acceptance rate comparison (bar gauge)
    
    Template variables for Copilot connection/scope selection.
    Data from _tool_copilot_enterprise_daily_metrics vs
    _tool_q_dev_user_report and _tool_q_dev_user_data.
    
    * feat(q-dev): add Kiro AI Model ROI dashboard (#8795)
    
    Add a Grafana dashboard analyzing per-model performance from chat logs:
    - Model Performance Summary table (requests, share%, avg prompt/response
      length, response/prompt ratio, steering/spec mode usage)
    - Daily Model Usage Distribution (stacked bar chart)
    - Avg Response Length by Model trend (output quality proxy)
    
    Data source: _tool_q_dev_chat_log grouped by model_id.
    
    * feat(q-dev): add Steering & Spec Mode Adoption dashboard (#8798)
    
    Track Kiro steering rules and spec mode adoption:
    - User/request adoption rate stats
    - Weekly adoption rate trend
    - Steering impact on prompt/response length
    - Per-user feature adoption table
    
    * feat(q-dev): add Developer AI Productivity Hours dashboard (#8797)
    
    Analyze when developers are most productive with AI tools:
    - AI Activity by Hour of Day (chat + completions stacked bar)
    - Prompt & Response Length by Hour (complexity patterns)
    - Feature Usage by Hour (steering/spec mode/plain chat)
    - AI Activity by Day of Week
    
    * feat(q-dev): add Language AI Heatmap dashboard (#8796)
    
    Analyze AI-assisted coding patterns by programming language:
    - Language Completion Profile table (requests, avg completions,
      context sizes, users per language)
    - Daily Completions by Language (stacked bar)
    - Active File Types During Chat (donut)
    - Avg Context Size by Language trend (top 5)
    
    * Fix/circleci column names (#8799)
    
    * fix(circleci): rename created_at to created_date in jobs/workflows
    Add migration to copy created_at -> created_date and update 
models/converters.
    
    * fix(circleci): update pipeline parsing
    
    * test(circleci): add incremental tests for collectors
    
    * fix(jenkins): scope multi-branch build collection to current project 
(#8430) (#8781)
    
    The branch jobs query in collectMultiBranchJobApiBuilds selected all
    WorkflowJob entries across all multi-branch pipelines for a connection,
    causing builds to be duplicated and misattributed. Filter by
    _raw_data_params to collect only the current project's branch jobs.
    
    Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
    
    * fix: Make gh-copilot plugin database agnostic (#8779)
    
    Co-authored-by: Eldrick Wega <[email protected]>
    
    * fix(sonarqube): increase cq_issues and cq_file_metrics project_key length 
to 500 (#8783)
    
    Fixes #8331
    
    * feat: added taiga plugin (#8755)
    
    * feat: added taiga plugin
    
    * fix: fixed tests
    
    * feat(gh-copilot): add support for organization daily user metrics (#8747)
    
    * feat(circleci): add server version requirement and endpoint help text 
(#8757)
    
    Update CircleCI connection form to indicate Server v4.x+ requirement
    and provide guidance for server endpoint configuration.
    
    Signed-off-by: Joshua Smith <[email protected]>
    
    * fix: fixed test files
    
    ---------
    
    Signed-off-by: Joshua Smith <[email protected]>
    Co-authored-by: Reece Ward <[email protected]>
    Co-authored-by: Joshua Smith <[email protected]>
    
    * fix(docker): pin Poetry to 2.2.1 for Python 3.9 compatibility (#8735)
    
    Poetry 2.3.0 dropped Python 3.9 support. Without cache the installer
    fetches the latest version (currently 2.3.2), which fails on the
    python:3.9-slim-bookworm base image. Pin to 2.2.1, the last release
    compatible with Python 3.9.
    
    Co-authored-by: Rodrigo Silva <[email protected]>
    
    * fix(linker): scope clearHistoryData to current project only (#8814) 
(#8815)
    
    The clearHistoryData() function used a LEFT JOIN with project_name
    in the ON clause, causing the subquery to return all PR IDs regardless
    of project. This effectively wiped the entire pull_request_issues table
    on every linker run, deleting links from other projects sharing the
    same repos and links created by the GitHub converter.
    
    Fix:
    - Use INNER JOIN + WHERE for proper project scoping
    - Add issue-side subquery scoped to current project's boards
    - Filter by _raw_data_table/_raw_data_remark to only delete
      linker-created rows
    
    Add e2e test for cross-project shared repo scenario.
    
    * fix(circleci): prevent negative values when calculating circleci (#8800)
    
    workflow duration
    
    * fix: sonarqube: missing api/users/search endpoint (#8813)
    
    * fix(argocd): extract revision from multi-source application revisions[] 
(#8810)
    
    ---------
    
    Signed-off-by: Joshua Smith <[email protected]>
    Signed-off-by: Yaroslav Halchenko <[email protected]>
    Co-authored-by: tamas.albert <[email protected]>
    Co-authored-by: Warren Chen <[email protected]>
    Co-authored-by: Ema Abitante <[email protected]>
    Co-authored-by: Spiff Azeta <[email protected]>
    Co-authored-by: Spiff Azeta <[email protected]>
    Co-authored-by: Eldrick Wega <[email protected]>
    Co-authored-by: Dan Crews <[email protected]>
    Co-authored-by: Tomoya Kawaguchi 
<[email protected]>
    Co-authored-by: Joshua Smith <[email protected]>
    Co-authored-by: jawad khan <[email protected]>
    Co-authored-by: Leif Roger Frøysaa <[email protected]>
    Co-authored-by: Klesh Wong <[email protected]>
    Co-authored-by: NaRro <[email protected]>
    Co-authored-by: Claude Opus 4.6 <[email protected]>
    Co-authored-by: Reece Ward <[email protected]>
    Co-authored-by: Yaroslav Halchenko <[email protected]>
    Co-authored-by: Chris Pavlicek <[email protected]>
    Co-authored-by: AvivGuiser <[email protected]>
    Co-authored-by: Shayne Clausson <[email protected]>
    Co-authored-by: irfanuddinahmad 
<[email protected]>
    Co-authored-by: Rodrigo Silva <[email protected]>
    Co-authored-by: Rodrigo Silva <[email protected]>
    Co-authored-by: Daniele M. <[email protected]>
    Co-authored-by: Pavel Sturc <[email protected]>
    Co-authored-by: Anvesh Vemula 
<[email protected]>
---
 .../tasks/enterprise_metrics_collector.go          |  41 +-----
 .../gh-copilot/tasks/metrics_collector_test.go     |  76 +++++++++-
 .../gh-copilot/tasks/org_metrics_collector.go      |   2 +-
 .../gh-copilot/tasks/report_download_helper.go     | 157 +++++++++++++++++++++
 .../gh-copilot/tasks/user_metrics_collector.go     |  82 ++++++-----
 5 files changed, 282 insertions(+), 76 deletions(-)

diff --git a/backend/plugins/gh-copilot/tasks/enterprise_metrics_collector.go 
b/backend/plugins/gh-copilot/tasks/enterprise_metrics_collector.go
index a47ef573e..16061cf50 100644
--- a/backend/plugins/gh-copilot/tasks/enterprise_metrics_collector.go
+++ b/backend/plugins/gh-copilot/tasks/enterprise_metrics_collector.go
@@ -20,7 +20,6 @@ package tasks
 import (
        "encoding/json"
        "fmt"
-       "io"
        "net/http"
        "net/url"
        "time"
@@ -76,6 +75,7 @@ func CollectEnterpriseMetrics(taskCtx plugin.SubTaskContext) 
errors.Error {
 
        now := time.Now().UTC()
        start, until := computeReportDateRange(now, collector.GetSince())
+       start = clampDailyMetricsStartForBackfill(start, until)
        logger := taskCtx.GetLogger()
 
        dayIter := newDayIterator(start, until)
@@ -95,44 +95,7 @@ func CollectEnterpriseMetrics(taskCtx plugin.SubTaskContext) 
errors.Error {
                Concurrency:   1,
                AfterResponse: ignoreNoContent,
                ResponseParser: func(res *http.Response) ([]json.RawMessage, 
errors.Error) {
-                       body, readErr := io.ReadAll(res.Body)
-                       res.Body.Close()
-                       if readErr != nil {
-                               return nil, errors.Default.Wrap(readErr, 
"failed to read report metadata")
-                       }
-                       if isEmptyReport(body) {
-                               return nil, nil
-                       }
-
-                       var meta reportMetadataResponse
-                       if jsonErr := json.Unmarshal(body, &meta); jsonErr != 
nil {
-                               snippet := string(body)
-                               if len(snippet) > 200 {
-                                       snippet = snippet[:200]
-                               }
-                               logger.Error(jsonErr, "failed to parse report 
metadata, body=%s", snippet)
-                               return nil, errors.Default.Wrap(jsonErr, 
"failed to parse report metadata")
-                       }
-
-                       if len(meta.DownloadLinks) == 0 {
-                               logger.Info("No download links for report 
day=%s, skipping", meta.ReportDay)
-                               return nil, nil
-                       }
-
-                       // Download each report file and return contents as raw 
messages
-                       var results []json.RawMessage
-                       for _, link := range meta.DownloadLinks {
-                               reportBody, dlErr := downloadReport(link, 
logger)
-                               if dlErr != nil {
-                                       logger.Error(nil, "failed to download 
report for day=%s: %s", meta.ReportDay, dlErr.Error())
-                                       return nil, dlErr
-                               }
-                               if reportBody == nil {
-                                       continue // blob not found, skip
-                               }
-                               results = append(results, 
json.RawMessage(reportBody))
-                       }
-                       return results, nil
+                       return parseRawReportResponse(res, logger)
                },
        })
        if err != nil {
diff --git a/backend/plugins/gh-copilot/tasks/metrics_collector_test.go 
b/backend/plugins/gh-copilot/tasks/metrics_collector_test.go
index e8b1dbecc..d71d20d44 100644
--- a/backend/plugins/gh-copilot/tasks/metrics_collector_test.go
+++ b/backend/plugins/gh-copilot/tasks/metrics_collector_test.go
@@ -18,6 +18,8 @@ limitations under the License.
 package tasks
 
 import (
+       "bytes"
+       "io"
        "net/http"
        "testing"
        "time"
@@ -45,6 +47,7 @@ func TestComputeReportDateRangeDefaultLookback(t *testing.T) {
 }
 
 func TestComputeReportDateRangeUsesSince(t *testing.T) {
+       // since is far enough in the past that the lookback buffer doesn't 
apply.
        now := time.Date(2025, 1, 10, 12, 0, 0, 0, time.UTC)
        since := time.Date(2025, 1, 3, 12, 0, 0, 0, time.UTC)
        start, until := computeReportDateRange(now, &since)
@@ -61,11 +64,82 @@ func TestComputeReportDateRangeClampsToLookback(t 
*testing.T) {
 }
 
 func TestComputeReportDateRangeClampsFutureSince(t *testing.T) {
+       // Future since is clamped to until, then the lookback buffer applies.
        now := time.Date(2025, 1, 10, 12, 0, 0, 0, time.UTC)
        since := now.Add(24 * time.Hour)
        start, until := computeReportDateRange(now, &since)
        require.Equal(t, time.Date(2025, 1, 9, 0, 0, 0, 0, time.UTC), until)
-       require.Equal(t, time.Date(2025, 1, 9, 0, 0, 0, 0, time.UTC), start)
+       require.Equal(t, time.Date(2025, 1, 7, 0, 0, 0, 0, time.UTC), start)
+}
+
+func TestComputeReportDateRangeLookbackBuffer(t *testing.T) {
+       // since is yesterday: without the buffer we'd only request 1 day 
(yesterday).
+       // With the buffer we look back reportLookbackDays days to retry any 
404'd days.
+       now := time.Date(2025, 1, 10, 0, 0, 0, 0, time.UTC)  // midnight run
+       since := time.Date(2025, 1, 9, 0, 0, 0, 0, time.UTC) // 
LatestSuccessStart from previous midnight run
+       start, until := computeReportDateRange(now, &since)
+       require.Equal(t, time.Date(2025, 1, 9, 0, 0, 0, 0, time.UTC), until)
+       require.Equal(t, time.Date(2025, 1, 7, 0, 0, 0, 0, time.UTC), start)
+}
+
+func TestClampDailyMetricsStartForBackfillRecentStart(t *testing.T) {
+       until := time.Date(2025, 1, 9, 0, 0, 0, 0, time.UTC)
+       start := time.Date(2025, 1, 7, 0, 0, 0, 0, time.UTC)
+
+       clamped := clampDailyMetricsStartForBackfill(start, until)
+       require.Equal(t, time.Date(2025, 1, 6, 0, 0, 0, 0, time.UTC), clamped)
+}
+
+func TestClampDailyMetricsStartForBackfillKeepsOlderStart(t *testing.T) {
+       until := time.Date(2025, 1, 9, 0, 0, 0, 0, time.UTC)
+       start := time.Date(2025, 1, 3, 0, 0, 0, 0, time.UTC)
+
+       clamped := clampDailyMetricsStartForBackfill(start, until)
+       require.Equal(t, start, clamped)
+}
+
+func TestUserMetricsDateRangeAppliesFourDayBackfillWindow(t *testing.T) {
+       now := time.Date(2025, 1, 10, 0, 0, 0, 0, time.UTC)
+       since := time.Date(2025, 1, 9, 0, 0, 0, 0, time.UTC)
+
+       start, until := computeReportDateRange(now, &since)
+       start = clampDailyMetricsStartForBackfill(start, until)
+
+       require.Equal(t, time.Date(2025, 1, 9, 0, 0, 0, 0, time.UTC), until)
+       require.Equal(t, time.Date(2025, 1, 6, 0, 0, 0, 0, time.UTC), start)
+}
+
+func TestParseReportMetadataResponseNoContent(t *testing.T) {
+       res := &http.Response{
+               StatusCode: http.StatusNoContent,
+               Body:       io.NopCloser(bytes.NewReader(nil)),
+       }
+
+       meta, err := parseReportMetadataResponse(res, nil)
+       require.NoError(t, err)
+       require.Nil(t, meta)
+}
+
+func TestParseReportMetadataResponseEmptyBody(t *testing.T) {
+       res := &http.Response{
+               StatusCode: http.StatusOK,
+               Body:       io.NopCloser(bytes.NewReader(nil)),
+       }
+
+       meta, err := parseReportMetadataResponse(res, nil)
+       require.NoError(t, err)
+       require.Nil(t, meta)
+}
+
+func TestParseReportMetadataResponseEmptyString(t *testing.T) {
+       res := &http.Response{
+               StatusCode: http.StatusOK,
+               Body:       io.NopCloser(bytes.NewReader([]byte(`""`))),
+       }
+
+       meta, err := parseReportMetadataResponse(res, nil)
+       require.NoError(t, err)
+       require.Nil(t, meta)
 }
 
 func TestIsEmptyReport(t *testing.T) {
diff --git a/backend/plugins/gh-copilot/tasks/org_metrics_collector.go 
b/backend/plugins/gh-copilot/tasks/org_metrics_collector.go
index 5afeee1f8..8f651c482 100644
--- a/backend/plugins/gh-copilot/tasks/org_metrics_collector.go
+++ b/backend/plugins/gh-copilot/tasks/org_metrics_collector.go
@@ -20,7 +20,6 @@ package tasks
 import (
        "encoding/json"
        "fmt"
-       "io"
        "net/http"
        "net/url"
        "time"
@@ -70,6 +69,7 @@ func CollectOrgMetrics(taskCtx plugin.SubTaskContext) 
errors.Error {
 
        now := time.Now().UTC()
        start, until := computeReportDateRange(now, collector.GetSince())
+       start = clampDailyMetricsStartForBackfill(start, until)
        logger := taskCtx.GetLogger()
 
        dayIter := newDayIterator(start, until)
diff --git a/backend/plugins/gh-copilot/tasks/report_download_helper.go 
b/backend/plugins/gh-copilot/tasks/report_download_helper.go
index 538fd90e0..236b03988 100644
--- a/backend/plugins/gh-copilot/tasks/report_download_helper.go
+++ b/backend/plugins/gh-copilot/tasks/report_download_helper.go
@@ -33,6 +33,14 @@ import (
 // reportMaxDays is the maximum historical window the new report API supports 
(1 year).
 const reportMaxDays = 365
 
+// reportLookbackDays: extra days rewound from 'until' on incremental runs.
+// GitHub reports are generated hours after midnight, so a midnight run gets 
404 for the previous
+// day. Without this buffer, 'LatestSuccessStart' advances past the missed day 
permanently.
+const reportLookbackDays = 2
+
+// dailyMetricsTrailingBackfillDays extends retries for delayed daily report 
generation.
+const dailyMetricsTrailingBackfillDays = 4
+
 // copilotRawParams identifies a set of raw data records for a given 
connection/scope.
 type copilotRawParams struct {
        ConnectionId uint64
@@ -62,6 +70,14 @@ func ignoreNoContent(res *http.Response) errors.Error {
        return nil
 }
 
+func clampDailyMetricsStartForBackfill(start, until time.Time) time.Time {
+       trailingStart := until.AddDate(0, 0, -(dailyMetricsTrailingBackfillDays 
- 1))
+       if start.After(trailingStart) {
+               return trailingStart
+       }
+       return start
+}
+
 // isEmptyReport returns true when the GitHub API returned an HTTP 200 but the
 // body carries no usable report data.  For dates before Copilot usage data was
 // available the API responds with "" (empty JSON string) instead of a 404.
@@ -79,7 +95,144 @@ type reportMetadataResponse struct {
        ReportEndDay   string `json:"report_end_day"`
 }
 
+func readReportMetadataBody(res *http.Response) ([]byte, errors.Error) {
+       body, readErr := io.ReadAll(res.Body)
+       res.Body.Close()
+       if readErr != nil {
+               return nil, errors.Default.Wrap(readErr, "failed to read report 
metadata")
+       }
+       return body, nil
+}
+
+func logReportMetadataParseError(body []byte, err error, logger log.Logger) {
+       if logger == nil {
+               return
+       }
+       snippet := string(body)
+       if len(snippet) > 200 {
+               snippet = snippet[:200]
+       }
+       logger.Error(err, "failed to parse report metadata, body=%s", snippet)
+}
+
+func reportMetadataRange(meta reportMetadataResponse) string {
+       if meta.ReportDay != "" {
+               return meta.ReportDay
+       }
+       if meta.ReportStartDay != "" && meta.ReportEndDay != "" {
+               return fmt.Sprintf("%s..%s", meta.ReportStartDay, 
meta.ReportEndDay)
+       }
+       return ""
+}
+
+func logMissingDownloadLinks(meta reportMetadataResponse, logger log.Logger) {
+       if logger == nil || len(meta.DownloadLinks) != 0 {
+               return
+       }
+       reportRange := reportMetadataRange(meta)
+       if reportRange != "" {
+               logger.Info("No download links for report day=%s, skipping", 
reportRange)
+               return
+       }
+       logger.Info("No download links in report metadata, skipping")
+}
+
+func parseReportMetadata(body []byte, logger log.Logger) 
(*reportMetadataResponse, errors.Error) {
+       trimmed := bytes.TrimSpace(body)
+       if len(trimmed) == 0 {
+               if logger != nil {
+                       logger.Info("Report metadata response was empty, 
skipping")
+               }
+               return nil, nil
+       }
+
+       // Handle JSON-encoded empty string ""
+       if bytes.Equal(trimmed, []byte(`""`)) {
+               if logger != nil {
+                       logger.Info("Report metadata response was empty string, 
skipping")
+               }
+               return nil, nil
+       }
+
+       var meta reportMetadataResponse
+       if jsonErr := json.Unmarshal(trimmed, &meta); jsonErr != nil {
+               logReportMetadataParseError(trimmed, jsonErr, logger)
+               return nil, errors.Default.Wrap(jsonErr, "failed to parse 
report metadata")
+       }
+
+       logMissingDownloadLinks(meta, logger)
+
+       return &meta, nil
+}
+
+func parseReportMetadataResponse(res *http.Response, logger log.Logger) 
(*reportMetadataResponse, errors.Error) {
+       if res.StatusCode == http.StatusNoContent {
+               if logger != nil {
+                       logger.Info("Report metadata not ready yet (204), 
skipping for now")
+               }
+               res.Body.Close()
+               return nil, nil
+       }
+
+       body, readErr := readReportMetadataBody(res)
+       if readErr != nil {
+               return nil, readErr
+       }
+
+       return parseReportMetadata(body, logger)
+}
+
+func collectRawReportRecords(meta *reportMetadataResponse, logger log.Logger) 
([]json.RawMessage, errors.Error) {
+       if len(meta.DownloadLinks) == 0 {
+               logger.Info("No download links for report day=%s, skipping", 
meta.ReportDay)
+               return nil, nil
+       }
+
+       var results []json.RawMessage
+       for _, link := range meta.DownloadLinks {
+               reportBody, dlErr := downloadReport(link, logger)
+               if dlErr != nil {
+                       return nil, dlErr
+               }
+               if reportBody == nil {
+                       continue
+               }
+               results = append(results, json.RawMessage(reportBody))
+       }
+       return results, nil
+}
+
+func parseRawReportResponse(res *http.Response, logger log.Logger) 
([]json.RawMessage, errors.Error) {
+       body, readErr := io.ReadAll(res.Body)
+       res.Body.Close()
+       if readErr != nil {
+               return nil, errors.Default.Wrap(readErr, "failed to read report 
metadata")
+       }
+       if isEmptyReport(body) {
+               return nil, nil
+       }
+
+       var meta *reportMetadataResponse
+       if jsonErr := json.Unmarshal(body, &meta); jsonErr != nil {
+               snippet := string(body)
+               if len(snippet) > 200 {
+                       snippet = snippet[:200]
+               }
+               logger.Error(jsonErr, "failed to parse report metadata, 
body=%s", snippet)
+               return nil, errors.Default.Wrap(jsonErr, "failed to parse 
report metadata")
+       }
+
+       meta, err := parseReportMetadataResponse(res, logger)
+       if err != nil || meta == nil {
+               return nil, err
+       }
+
+       return collectRawReportRecords(meta, logger)
+}
+
 // computeReportDateRange returns the range of dates to collect, clamped to 
the API max.
+// When 'since' is set, 'start' is rewound to at least 'until - 
reportLookbackDays'
+// so days that returned 404 (report not yet generated) are retried on 
subsequent runs.
 func computeReportDateRange(now time.Time, since *time.Time) (start, until 
time.Time) {
        until = utcDate(now).AddDate(0, 0, -1) // reports are available for the 
previous day
        min := until.AddDate(0, 0, -(reportMaxDays - 1))
@@ -92,6 +245,10 @@ func computeReportDateRange(now time.Time, since 
*time.Time) (start, until time.
                if start.After(until) {
                        start = until
                }
+               // Rewind 'start' by 'reportLookbackDays' so recently-missed 
days are retried.
+               if lookback := until.AddDate(0, 0, -reportLookbackDays); 
start.After(lookback) {
+                       start = lookback
+               }
        }
        return start, until
 }
diff --git a/backend/plugins/gh-copilot/tasks/user_metrics_collector.go 
b/backend/plugins/gh-copilot/tasks/user_metrics_collector.go
index 06905015a..a092a0450 100644
--- a/backend/plugins/gh-copilot/tasks/user_metrics_collector.go
+++ b/backend/plugins/gh-copilot/tasks/user_metrics_collector.go
@@ -26,13 +26,57 @@ import (
        "time"
 
        "github.com/apache/incubator-devlake/core/errors"
+       "github.com/apache/incubator-devlake/core/log"
        "github.com/apache/incubator-devlake/core/plugin"
        helper "github.com/apache/incubator-devlake/helpers/pluginhelper/api"
 )
 
 const rawUserMetricsTable = "copilot_user_metrics"
 
-// CollectUserMetrics collects user-level daily Copilot usage reports.
+func collectUserMetricsRecords(meta *reportMetadataResponse, logger 
log.Logger) ([]json.RawMessage, errors.Error) {
+       var results []json.RawMessage
+       for _, link := range meta.DownloadLinks {
+               reportBody, dlErr := downloadReport(link, logger)
+               if dlErr != nil {
+                       return nil, dlErr
+               }
+               if reportBody == nil {
+                       continue // blob not found, skip
+               }
+               // Parse JSONL: split by newlines and return each non-empty 
line.
+               userRecords, parseErr := parseJSONL(reportBody)
+               if parseErr != nil {
+                       return nil, parseErr
+               }
+               results = append(results, userRecords...)
+       }
+       return results, nil
+}
+
+func parseUserMetricsReportResponse(res *http.Response, logger log.Logger) 
([]json.RawMessage, errors.Error) {
+       body, readErr := io.ReadAll(res.Body)
+       res.Body.Close()
+       if readErr != nil {
+               return nil, errors.Default.Wrap(readErr, "failed to read report 
metadata")
+       }
+       if isEmptyReport(body) {
+               return nil, nil
+       }
+
+       var meta *reportMetadataResponse
+       if jsonErr := json.Unmarshal(body, &meta); jsonErr != nil {
+               return nil, errors.Default.Wrap(jsonErr, "failed to parse 
report metadata")
+       }
+
+       meta, err := parseReportMetadataResponse(res, logger)
+       if err != nil || meta == nil {
+               return nil, err
+       }
+
+       return collectUserMetricsRecords(meta, logger)
+}
+
+// CollectUserMetrics collects enterprise user-level daily Copilot usage 
reports.
 // These reports are in JSONL format (one JSON object per line per user).
 // Utilizes the enterprise or organization endpoints depending on connection 
configuration
 func CollectUserMetrics(taskCtx plugin.SubTaskContext) errors.Error {
@@ -76,6 +120,7 @@ func CollectUserMetrics(taskCtx plugin.SubTaskContext) 
errors.Error {
 
        now := time.Now().UTC()
        start, until := computeReportDateRange(now, collector.GetSince())
+       start = clampDailyMetricsStartForBackfill(start, until)
        logger := taskCtx.GetLogger()
 
        dayIter := newDayIterator(start, until)
@@ -94,40 +139,7 @@ func CollectUserMetrics(taskCtx plugin.SubTaskContext) 
errors.Error {
                Concurrency:   1,
                AfterResponse: ignoreNoContent,
                ResponseParser: func(res *http.Response) ([]json.RawMessage, 
errors.Error) {
-                       body, readErr := io.ReadAll(res.Body)
-                       res.Body.Close()
-                       if readErr != nil {
-                               return nil, errors.Default.Wrap(readErr, 
"failed to read report metadata")
-                       }
-                       if isEmptyReport(body) {
-                               return nil, nil
-                       }
-
-                       var meta reportMetadataResponse
-                       if jsonErr := json.Unmarshal(body, &meta); jsonErr != 
nil {
-                               return nil, errors.Default.Wrap(jsonErr, 
"failed to parse report metadata")
-                       }
-
-                       // User reports are JSONL — each download link returns 
one file where
-                       // each line is a separate JSON object for one user's 
daily metrics.
-                       // We download the file and split into individual JSON 
messages.
-                       var results []json.RawMessage
-                       for _, link := range meta.DownloadLinks {
-                               reportBody, dlErr := downloadReport(link, 
logger)
-                               if dlErr != nil {
-                                       return nil, dlErr
-                               }
-                               if reportBody == nil {
-                                       continue // blob not found, skip
-                               }
-                               // Parse JSONL: split by newlines and return 
each non-empty line
-                               userRecords, parseErr := parseJSONL(reportBody)
-                               if parseErr != nil {
-                                       return nil, parseErr
-                               }
-                               results = append(results, userRecords...)
-                       }
-                       return results, nil
+                       return parseUserMetricsReportResponse(res, logger)
                },
        })
        if err != nil {

(devlake) branch main updated: fix: Copilot plugin - add backfill time window to ensure data consistency (#8811)

Reply via email to