[ https://issues.apache.org/jira/browse/GOBBLIN-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Abhishek Tiwari resolved GOBBLIN-6. ----------------------------------- Resolution: Fixed Issue resolved by pull request #1993 [https://github.com/apache/incubator-gobblin/pull/1993] > Support eventual consistent filesystems like S3 > ----------------------------------------------- > > Key: GOBBLIN-6 > URL: https://issues.apache.org/jira/browse/GOBBLIN-6 > Project: Apache Gobblin > Issue Type: Task > Reporter: Abhishek Tiwari > > In our environment we use Gobblin for shipping logs to s3. Gobblin on it's > own works pretty well but at a couple of place it assumes the underlying fs > is consistent which is not true in some case like in some S3 region. > To overcome this I added a couple of retry if the created dir/file fails to > exist right away after publish. > The following changes were added: > - Refactored RetryWriter to gobblin-core to be able use in WriterUtils and > not having circular dependencies. > - Introducing mkdirsWithRecursivePermissionWithRetry to be able set retry if > directory does not exists right after creation on eventual consistent fs. > - Adding retry to publisher (data.publisher.retry.enabled=true) like > TimestampDataPublisher, TimePartitinedDataPublisher to support eventual > consisteny filesystem targets > - Tmp fs can be specified with compaction.tmp.fs in compaction job to be able > use hdfs for tmp fs and store result on S3. Earlier it was not possible to > use differnet fs for tmp and target. > - Retry can be set for compaction if you don't want to fail right away if > directory fails to show up right away which can happen on eventual consistent > fs (compaction.retry.enabled=true) > - Adding dataset name for compaction mr job name which makes significantly > easier to identify which compaction job belongs to which dataset. > - Some minor modification to support non avro extensions > > *Github Url* : https://github.com/linkedin/gobblin/pull/1993 > *Github Reporter* : *treff7es* > *Github Created At* : 2017-07-03T13:36:42Z > *Github Updated At* : 2017-07-10T18:24:51Z > h3. Comments > ---- > *treff7es* wrote on 2017-07-03T13:41:24Z : I split up my previous pull > request into multiple one (#1686) . This one is about supporting eventual > consistent filesystems. I will create a separate pull request for the json > specific changes which is a separate module. > This pr also contains change which will be needed for the following pr as > well to be able to support non .avro file extensions. > > *Github Url* : > https://github.com/linkedin/gobblin/pull/1993#issuecomment-312648695 > ---- > [~ibuenros] wrote on 2017-07-10T18:19:48Z : @htran1 can you take a look? > > *Github Url* : > https://github.com/linkedin/gobblin/pull/1993#issuecomment-314191228 > ---- > [~hutran] wrote on 2017-07-10T18:24:51Z : @abti, can you also take a look > since you reviewed the original PR that this was split from? > > *Github Url* : > https://github.com/linkedin/gobblin/pull/1993#issuecomment-314192685 -- This message was sent by Atlassian JIRA (v6.4.14#64029)