[
https://issues.apache.org/jira/browse/GOBBLIN-2211?focusedWorklogId=974732&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-974732
]
ASF GitHub Bot logged work on GOBBLIN-2211:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 16/Jul/25 05:50
Start Date: 16/Jul/25 05:50
Worklog Time Spent: 10m
Work Description: NamsB7 opened a new pull request, #4121:
URL: https://github.com/apache/gobblin/pull/4121
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I
have checked off all the steps below!
### JIRA
- [x] My PR addresses the following [Gobblin
JIRA](https://issues.apache.org/jira/browse/GOBBLIN-2211) issues and references
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
- https://issues.apache.org/jira/browse/GOBBLIN-2211
### Description
- [x] Here are some details about my PR, including screenshots (if
applicable):
- Introduces an error classification system for Gobblin jobs, enabling
automatic priority-based categorization job failures based on configurable
error pattern descriptions.
- Modified job execution flow to classify errors on job failure
- Integrated with existing JobIssueEventHandler for error logging
- No breaking changes to existing functionality
- Classification only runs on job failures when enabled
#### Why These Changes
- Provides intelligent error analysis by matching failures against
known patterns
- Consolidates multiple errors into a single, prioritized final error with
enriched context
- Suggests probable root causes rather than definitive diagnosis
- Appends pattern-matched summaries to enhance error visibility
- Reduces time spent manually correlating similar failures
- Builds organizational knowledge of common failure patterns
##### Key features
- Pattern-based Classification: Uses regex patterns to match and categorize
errors
- Priority-based Selection: Returns the highest priority error when multiple
patterns match
- Extensible Storage: Pluggable storage backend (in-memory, MySQL, etc.)
- Performance Optimized: Early stopping mechanism to avoid unnecessary
pattern matching
- Configurable: Dynamic SQL column sizes and table names
- This change introduces two major new components:
- ErrorClassifier: New class defining the contract for error
classification
- ErrorPatternStore: New interface for managing error pattern
persistence
##### Configurations
- `error.regex.db.table.key`: MySQL table for error patterns
- `error.categories.db.table.key`: MySQL table for error categories
- `error.regex.max.varchar.size`: Configurable VARCHAR size for pattern
storage
- `error.category.max.varchar.size`: Configurable VARCHAR size for category
names
###### Service Configurations
- `errorPatternStore.class`: Configures the pattern store impl
- `errorClassification.enabled`: Toggles for classification. By deafult:
disabled
- `errorClassification.maxErrorsInFinal`: Caps final error count
- `errorClassification.maxErrorsToProcess`: Limits errors processed for
performance
### Tests
- [x] My PR adds the following unit tests:
- **ErrorClassifierTest**: Simple unit test suite checking, for a given
list of issues with severity ERROR:
- Pattern matching and classification logic
- Priority-based error selection when multiple patterns match
- Early stopping optimization
- Edge cases (null/empty errors, no matches)
- Integration with InMemory ErrorPatternStore implementation
### Commits
- [x] My commits all reference JIRA issues in their subject lines, and I
have squashed multiple commits if they address the same issue. In addition, my
commits follow the guidelines from "[How to write a good git commit
message](http://chris.beams.io/posts/git-commit/)":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not "adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"
Issue Time Tracking
-------------------
Worklog Id: (was: 974732)
Remaining Estimate: 0h
Time Spent: 10m
> Implement Error Classification based on execution issues
> --------------------------------------------------------
>
> Key: GOBBLIN-2211
> URL: https://issues.apache.org/jira/browse/GOBBLIN-2211
> Project: Apache Gobblin
> Issue Type: Bug
> Components: gobblin-service
> Reporter: Abhishek Jain
> Assignee: Abhishek Tiwari
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Implement Error Classification to categorize the failure reason based on
> issues encountered.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)