[
https://issues.apache.org/jira/browse/GOBBLIN-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Abhishek Tiwari resolved GOBBLIN-6.
-----------------------------------
Resolution: Fixed
Issue resolved by pull request #1993
[https://github.com/apache/incubator-gobblin/pull/1993]
> Support eventual consistent filesystems like S3
> -----------------------------------------------
>
> Key: GOBBLIN-6
> URL: https://issues.apache.org/jira/browse/GOBBLIN-6
> Project: Apache Gobblin
> Issue Type: Task
> Reporter: Abhishek Tiwari
>
> In our environment we use Gobblin for shipping logs to s3. Gobblin on it's
> own works pretty well but at a couple of place it assumes the underlying fs
> is consistent which is not true in some case like in some S3 region.
> To overcome this I added a couple of retry if the created dir/file fails to
> exist right away after publish.
> The following changes were added:
> - Refactored RetryWriter to gobblin-core to be able use in WriterUtils and
> not having circular dependencies.
> - Introducing mkdirsWithRecursivePermissionWithRetry to be able set retry if
> directory does not exists right after creation on eventual consistent fs.
> - Adding retry to publisher (data.publisher.retry.enabled=true) like
> TimestampDataPublisher, TimePartitinedDataPublisher to support eventual
> consisteny filesystem targets
> - Tmp fs can be specified with compaction.tmp.fs in compaction job to be able
> use hdfs for tmp fs and store result on S3. Earlier it was not possible to
> use differnet fs for tmp and target.
> - Retry can be set for compaction if you don't want to fail right away if
> directory fails to show up right away which can happen on eventual consistent
> fs (compaction.retry.enabled=true)
> - Adding dataset name for compaction mr job name which makes significantly
> easier to identify which compaction job belongs to which dataset.
> - Some minor modification to support non avro extensions
>
> *Github Url* : https://github.com/linkedin/gobblin/pull/1993
> *Github Reporter* : *treff7es*
> *Github Created At* : 2017-07-03T13:36:42Z
> *Github Updated At* : 2017-07-10T18:24:51Z
> h3. Comments
> ----
> *treff7es* wrote on 2017-07-03T13:41:24Z : I split up my previous pull
> request into multiple one (#1686) . This one is about supporting eventual
> consistent filesystems. I will create a separate pull request for the json
> specific changes which is a separate module.
> This pr also contains change which will be needed for the following pr as
> well to be able to support non .avro file extensions.
>
> *Github Url* :
> https://github.com/linkedin/gobblin/pull/1993#issuecomment-312648695
> ----
> [~ibuenros] wrote on 2017-07-10T18:19:48Z : @htran1 can you take a look?
>
> *Github Url* :
> https://github.com/linkedin/gobblin/pull/1993#issuecomment-314191228
> ----
> [~hutran] wrote on 2017-07-10T18:24:51Z : @abti, can you also take a look
> since you reviewed the original PR that this was split from?
>
> *Github Url* :
> https://github.com/linkedin/gobblin/pull/1993#issuecomment-314192685
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)