[ https://issues.apache.org/jira/browse/GOBBLIN-2211?focusedWorklogId=974732&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-974732 ]
ASF GitHub Bot logged work on GOBBLIN-2211: ------------------------------------------- Author: ASF GitHub Bot Created on: 16/Jul/25 05:50 Start Date: 16/Jul/25 05:50 Worklog Time Spent: 10m Work Description: NamsB7 opened a new pull request, #4121: URL: https://github.com/apache/gobblin/pull/4121 Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [x] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN-2211) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-2211 ### Description - [x] Here are some details about my PR, including screenshots (if applicable): - Introduces an error classification system for Gobblin jobs, enabling automatic priority-based categorization job failures based on configurable error pattern descriptions. - Modified job execution flow to classify errors on job failure - Integrated with existing JobIssueEventHandler for error logging - No breaking changes to existing functionality - Classification only runs on job failures when enabled #### Why These Changes - Provides intelligent error analysis by matching failures against known patterns - Consolidates multiple errors into a single, prioritized final error with enriched context - Suggests probable root causes rather than definitive diagnosis - Appends pattern-matched summaries to enhance error visibility - Reduces time spent manually correlating similar failures - Builds organizational knowledge of common failure patterns ##### Key features - Pattern-based Classification: Uses regex patterns to match and categorize errors - Priority-based Selection: Returns the highest priority error when multiple patterns match - Extensible Storage: Pluggable storage backend (in-memory, MySQL, etc.) - Performance Optimized: Early stopping mechanism to avoid unnecessary pattern matching - Configurable: Dynamic SQL column sizes and table names - This change introduces two major new components: - ErrorClassifier: New class defining the contract for error classification - ErrorPatternStore: New interface for managing error pattern persistence ##### Configurations - `error.regex.db.table.key`: MySQL table for error patterns - `error.categories.db.table.key`: MySQL table for error categories - `error.regex.max.varchar.size`: Configurable VARCHAR size for pattern storage - `error.category.max.varchar.size`: Configurable VARCHAR size for category names ###### Service Configurations - `errorPatternStore.class`: Configures the pattern store impl - `errorClassification.enabled`: Toggles for classification. By deafult: disabled - `errorClassification.maxErrorsInFinal`: Caps final error count - `errorClassification.maxErrorsToProcess`: Limits errors processed for performance ### Tests - [x] My PR adds the following unit tests: - **ErrorClassifierTest**: Simple unit test suite checking, for a given list of issues with severity ERROR: - Pattern matching and classification logic - Priority-based error selection when multiple patterns match - Early stopping optimization - Edge cases (null/empty errors, no matches) - Integration with InMemory ErrorPatternStore implementation ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" Issue Time Tracking ------------------- Worklog Id: (was: 974732) Remaining Estimate: 0h Time Spent: 10m > Implement Error Classification based on execution issues > -------------------------------------------------------- > > Key: GOBBLIN-2211 > URL: https://issues.apache.org/jira/browse/GOBBLIN-2211 > Project: Apache Gobblin > Issue Type: Bug > Components: gobblin-service > Reporter: Abhishek Jain > Assignee: Abhishek Tiwari > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Implement Error Classification to categorize the failure reason based on > issues encountered. -- This message was sent by Atlassian Jira (v8.20.10#820010)