[
https://issues.apache.org/jira/browse/FLUME-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Sun updated FLUME-2701:
----------------------------
Attachment: webhdfs.2.patch
> Adding WebHDFS support
> ----------------------
>
> Key: FLUME-2701
> URL: https://issues.apache.org/jira/browse/FLUME-2701
> Project: Flume
> Issue Type: New Feature
> Reporter: Mark Sun
> Attachments: webhdfs.1.patch, webhdfs.2.patch
>
>
> I'm using HttpFs as a HDFS Web Gateway to handle data from Flume in other
> datacenter via Internet or WAN, in my case, a gateway is necessary for
> minimizing the footprint required to access HDFS, but WebHDFS API do not
> support hsync(), which is required by Flume.
> HDFS will sync all data and metadata to DN disk before file close, and it
> also works in WebHDFS API. It seems to me that we can use this guarantee to
> make data safe without hsync() when unavailable. Personally, I guess it’s
> much easier than adding hsync() support to WebHDFS/HttpFs.
> Basically, the idea is making transaction open until rolling occurs, if we
> found the schema of HDFS URI is “webhdfs”.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)