[ 
https://issues.apache.org/jira/browse/GOBBLIN-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Tiwari resolved GOBBLIN-6.
-----------------------------------
    Resolution: Fixed

Issue resolved by pull request #1993
[https://github.com/apache/incubator-gobblin/pull/1993]

> Support eventual consistent filesystems like S3
> -----------------------------------------------
>
>                 Key: GOBBLIN-6
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-6
>             Project: Apache Gobblin
>          Issue Type: Task
>            Reporter: Abhishek Tiwari
>
> In our environment we use Gobblin for shipping logs to s3. Gobblin on it's 
> own works pretty well but at a couple of place it assumes the underlying fs 
> is consistent which is not true in some case like in some S3 region.
> To overcome this I added a couple of retry if the created dir/file fails to 
> exist right away after publish.
> The following changes were added:
> - Refactored RetryWriter to gobblin-core to be able use in WriterUtils and 
> not having circular dependencies.
> - Introducing mkdirsWithRecursivePermissionWithRetry to be able set retry if 
> directory does not exists right after creation on eventual consistent fs.
> - Adding retry to publisher (data.publisher.retry.enabled=true) like 
> TimestampDataPublisher, TimePartitinedDataPublisher to support eventual 
> consisteny filesystem targets
> - Tmp fs can be specified with compaction.tmp.fs in compaction job to be able 
> use hdfs for tmp fs and store result on S3. Earlier it was not possible to 
> use differnet fs for tmp and target.
> - Retry can be set for compaction if you don't want to fail right away if 
> directory fails to show up right away which can happen on eventual consistent 
> fs (compaction.retry.enabled=true)
> - Adding dataset name for compaction mr job name which makes significantly 
> easier to identify which compaction job belongs to which dataset.
> - Some minor modification to support non avro extensions
>  
> *Github Url* : https://github.com/linkedin/gobblin/pull/1993 
> *Github Reporter* : *treff7es* 
> *Github Created At* : 2017-07-03T13:36:42Z 
> *Github Updated At* : 2017-07-10T18:24:51Z 
> h3. Comments 
> ----
> *treff7es* wrote on 2017-07-03T13:41:24Z : I split up my previous pull 
> request into multiple one (#1686) . This one is about supporting eventual 
> consistent filesystems. I will create a separate pull request for the json 
> specific changes which is a separate module. 
> This pr also contains change which will be needed for the following pr as 
> well to be able to support non .avro file extensions. 
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/1993#issuecomment-312648695 
> ----
> [~ibuenros] wrote on 2017-07-10T18:19:48Z : @htran1 can you take a look? 
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/1993#issuecomment-314191228 
> ----
> [~hutran] wrote on 2017-07-10T18:24:51Z : @abti, can you also take a look 
> since you reviewed the original PR that this was split from? 
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/1993#issuecomment-314192685



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to