[
https://issues.apache.org/jira/browse/GOBBLIN-1001?focusedWorklogId=358097&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-358097
]
ASF GitHub Bot logged work on GOBBLIN-1001:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 11/Dec/19 21:28
Start Date: 11/Dec/19 21:28
Worklog Time Spent: 10m
Work Description: zxcware commented on issue #2846: [GOBBLIN-1001]
Implement TimePartitionGlobFinder
URL:
https://github.com/apache/incubator-gobblin/pull/2846#issuecomment-564740742
@autumnust Yeah, `yesterdayPartition` is really specific, I'm thinking about
generalize it to `enforcePreviousN`(looking for better name suggestions)
partitions. Its main responsibility is to create `EmptyFileSystemDataset` if
any of the previous N doesn't exist, signaling quiet time. In addition, it
focuses on time partitions and supports different time formats(not limitted to
`yyyy/MM/dd`) compared to vanilla `DefaultFileSystemGlobFinder`. (I'm adding
comments about it s usage)
By `enforcePreviousN`, it's tied with company requirements even less and
makes it more justifiable to open-source. In our use case, we capture the quiet
time signal to publish compaction watermark. It can be captured by others to do
different operations.
Another consideration was we have to make internal copies of open source
compaction constructs(`MRTask`, `Verifier`, `CompactionAction`), if
`EmptyFileSystemDataset` is made internal. Compared to make
`EmptyFileSystemDataset` first citizen of open source compaction flow, the
implementation and mountainous cost of internalization is high, given most of
our pipelines use open source compaction constructs
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 358097)
Time Spent: 50m (was: 40m)
> Implement TimePartitionGlobFinder
> ---------------------------------
>
> Key: GOBBLIN-1001
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1001
> Project: Apache Gobblin
> Issue Type: Task
> Reporter: Zhixiong Chen
> Assignee: Zhixiong Chen
> Priority: Major
> Time Spent: 50m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)