[ 
https://issues.apache.org/jira/browse/GOBBLIN-1419?focusedWorklogId=582717&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582717
 ]

ASF GitHub Bot logged work on GOBBLIN-1419:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/Apr/21 17:24
            Start Date: 14/Apr/21 17:24
    Worklog Time Spent: 10m 
      Work Description: ZihanLi58 commented on a change in pull request #3255:
URL: https://github.com/apache/gobblin/pull/3255#discussion_r613437586



##########
File path: 
gobblin-compaction/src/main/java/org/apache/gobblin/compaction/verify/CompactionThresholdVerifier.java
##########
@@ -60,7 +62,8 @@ public CompactionThresholdVerifier(State state) {
    * dataset. To avoid scalability issue, we choose a stateless approach where 
each dataset tracks
    * record count by themselves and persist it in the file system)
    *
-   * @return true iff the difference exceeds the threshold or this is the 
first time compaction
+   * @return true if the difference exceeds the threshold or this is the first 
time compaction or

Review comment:
       So the logic of verifier is if any of the verifier fail the dataset, the 
compaction will not run. In this case, if gmce verifier say it needs to re 
compact but threshold verifier say it does not need to be compacted, then the 
dataset will be skipped. That's the reason I embedded the logic here. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 582717)
    Time Spent: 40m  (was: 0.5h)

> Error handling for compaction pipeline on GMCE emitted error
> ------------------------------------------------------------
>
>                 Key: GOBBLIN-1419
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1419
>             Project: Apache Gobblin
>          Issue Type: Task
>            Reporter: Zihan Li
>            Priority: Major
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> No compaction publish file and emit GMCE are happening separately. If file 
> rename succeed but emit GMCE fails, we will end up with not emit GMCE for the 
> compaction. So need  a strategy to do error handling for compaction that when 
> emit GMCE fails, next job will treat the previous compaction as fail and 
> re-compact the data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to