[ 
https://issues.apache.org/jira/browse/GOBBLIN-2211?focusedWorklogId=974732&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-974732
 ]

ASF GitHub Bot logged work on GOBBLIN-2211:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/Jul/25 05:50
            Start Date: 16/Jul/25 05:50
    Worklog Time Spent: 10m 
      Work Description: NamsB7 opened a new pull request, #4121:
URL: https://github.com/apache/gobblin/pull/4121

   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   
   ### JIRA
   - [x] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN-2211) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
       - https://issues.apache.org/jira/browse/GOBBLIN-2211
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots (if 
applicable):
   - Introduces an error classification system for Gobblin jobs, enabling 
automatic priority-based categorization job failures based on configurable 
error pattern descriptions.
   - Modified job execution flow to classify errors on job failure
   - Integrated with existing JobIssueEventHandler for error logging
   - No breaking changes to existing functionality
   - Classification only runs on job failures when enabled
   
   #### Why These Changes
   - Provides intelligent error analysis by matching failures against
     known patterns
   - Consolidates multiple errors into a single, prioritized final error with 
enriched context
   - Suggests probable root causes rather than definitive diagnosis
   - Appends pattern-matched summaries to enhance error visibility
   - Reduces time spent manually correlating similar failures
   - Builds organizational knowledge of common failure patterns
   
   ##### Key features
   - Pattern-based Classification: Uses regex patterns to match and categorize 
errors
   - Priority-based Selection: Returns the highest priority error when multiple 
patterns match
   - Extensible Storage: Pluggable storage backend (in-memory, MySQL, etc.)
   - Performance Optimized: Early stopping mechanism to avoid unnecessary 
pattern matching
   - Configurable: Dynamic SQL column sizes and table names
   
   - This change introduces two major new components:
       - ErrorClassifier: New class defining the contract for error 
classification
       - ErrorPatternStore: New interface for managing error pattern 
persistence 
   
   ##### Configurations
   - `error.regex.db.table.key`: MySQL table for error patterns
   - `error.categories.db.table.key`: MySQL table for error categories
   - `error.regex.max.varchar.size`: Configurable VARCHAR size for pattern 
storage
   - `error.category.max.varchar.size`: Configurable VARCHAR size for category 
names
   
   ###### Service Configurations
   - `errorPatternStore.class`:  Configures the pattern store impl
   - `errorClassification.enabled`: Toggles for classification. By deafult: 
disabled
   - `errorClassification.maxErrorsInFinal`: Caps final error count
   - `errorClassification.maxErrorsToProcess`: Limits errors processed for 
performance
   
   ### Tests
   - [x] My PR adds the following unit tests:
     - **ErrorClassifierTest**: Simple unit test suite checking, for a given 
list of issues with severity ERROR:
       - Pattern matching and classification logic
       - Priority-based error selection when multiple patterns match
       - Early stopping optimization
       - Edge cases (null/empty errors, no matches)
       - Integration with InMemory ErrorPatternStore implementation
   
   ### Commits
   - [x] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
       1. Subject is separated from body by a blank line
       2. Subject is limited to 50 characters
       3. Subject does not end with a period
       4. Subject uses the imperative mood ("add", not "adding")
       5. Body wraps at 72 characters
       6. Body explains "what" and "why", not "how"
   
   
   
   
   
   




Issue Time Tracking
-------------------

            Worklog Id:     (was: 974732)
    Remaining Estimate: 0h
            Time Spent: 10m

> Implement Error Classification based on execution issues
> --------------------------------------------------------
>
>                 Key: GOBBLIN-2211
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-2211
>             Project: Apache Gobblin
>          Issue Type: Bug
>          Components: gobblin-service
>            Reporter: Abhishek Jain
>            Assignee: Abhishek Tiwari
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Implement Error Classification to categorize the failure reason based on 
> issues encountered.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to