[ 
https://issues.apache.org/jira/browse/FLUME-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558073#comment-14558073
 ] 

Mark Sun commented on FLUME-2701:
---------------------------------

I found it’s a little bit difficult to add this feature to HDFSEventSink, 
because we only have one current transaction but multiple target files for a 
HDFSEventSink.

> Adding WebHDFS support
> ----------------------
>
>                 Key: FLUME-2701
>                 URL: https://issues.apache.org/jira/browse/FLUME-2701
>             Project: Flume
>          Issue Type: New Feature
>            Reporter: Mark Sun
>
> I'm using HttpFs as a HDFS Web Gateway to handle data from Flume in other 
> datacenter via Internet or WAN, in my case, a gateway is necessary for 
> minimizing the footprint required to access HDFS, but WebHDFS API do not 
> support hsync(), which is required by Flume.
> HDFS will sync all data and metadata to DN disk before file close, and it 
> also works in WebHDFS API. It seems to me that we can use this guarantee to 
> make data safe without hsync()  when unavailable. Personally, I guess it’s 
> much easier than adding hsync() support to WebHDFS/HttpFs.
> Basically, the idea is making transaction open until rolling occurs, if we 
> found the schema of HDFS URI is “webhdfs”.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to