[ 
https://issues.apache.org/jira/browse/FLINK-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16269041#comment-16269041
 ] 

Stephan Ewen commented on FLINK-5789:
-------------------------------------

We may want to add further abstractions to the file system.

The S3 API has an interesting approach, allowing you to upload multiple 
individual chunks (multipart upload) and then issue a separate request to 
commit a set of these parts. That could be used, for example, to stage/flush 
parts on checkpoint, but not commit (publish) until it is time to roll over the 
bucket.

Because that is very S3 specific, it may make sense to have an abstraction for 
temp file, temp regions, committing those, etc. on top of the {{FileSystem}} 
abstraction.

> Make Bucketing Sink independent of Hadoop's FileSystem
> ------------------------------------------------------
>
>                 Key: FLINK-5789
>                 URL: https://issues.apache.org/jira/browse/FLINK-5789
>             Project: Flink
>          Issue Type: Bug
>          Components: Streaming Connectors
>    Affects Versions: 1.2.0, 1.1.4
>            Reporter: Stephan Ewen
>             Fix For: 1.5.0
>
>
> The {{BucketingSink}} is hard wired to Hadoop's FileSystem, bypassing Flink's 
> file system abstraction.
> This causes several issues:
>   - The bucketing sink will behave different than other file sinks with 
> respect to configuration
>   - Directly supported file systems (not through hadoop) like the MapR File 
> System does not work in the same way with the BuketingSink as other file 
> systems
>   - The previous point is all the more problematic in the effort to make 
> Hadoop an optional dependency and with in other stacks (Mesos, Kubernetes, 
> AWS, GCE, Azure) with ideally no Hadoop dependency.
> We should port the {{BucketingSink}} to use Flink's FileSystem classes.
> To support the *truncate* functionality that is needed for the exactly-once 
> semantics of the Bucketing Sink, we should extend Flink's FileSystem 
> abstraction to have the methods
>   - {{boolean supportsTruncate()}}
>   - {{void truncate(Path, long)}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to