Hello Tianyi HE, I noticed that a similar design for a federation proxying model has just been proposed on Apache JIRA HDFS-10467. You might want to join the conversation there.
https://issues.apache.org/jira/browse/HDFS-10467 --Chris Nauroth On 5/2/16, 10:32 AM, "Colin McCabe" <cmcc...@apache.org> wrote: >Hi Tianyi HE, > >Thanks for sharing this! This reminds me of the httpfs daemon. This >daemon basically sits in front of an HDFS cluster and accepts requests, >which it serves by forwarding them to the underlying HDFS instance. >There is some documentation about it here: >https://hadoop.apache.org/docs/stable/hadoop-hdfs-httpfs/index.html > >Since httpfs uses an org.apache.hadoop.fs.FileSystem instance, it seems >like you could plug in the apache.hadoop.fs.viewfs.ViewFileSystem class >and be up and running with federation. I haven't tried this, but I >would expect that it would work, unless there are bugs in ViewFS itself. > >The big advantage of httpfs is that it provides a webhdfs-style REST >interface. As you said, this kind of interface makes it simple to use >any language with REST bindings, without worrying about using a thick >client. > >The big disadvantage of httpfs is that you must move both metadata and >data operations through the httpfs daemon. This could become a >performance bottleneck. It seems like you are concerned about this >bottleneck. > >We also have webhdfs. Unlike httpfs, webhdfs doesn't require all the >data to move through its daemon. With webhdfs, the client talks to >DataNodes directly. > >I wonder if extending httpfs or webhdfs would be a better approach than >starting from scratch. There is a maintenance burden for adding new >services and daemons. This was our motivation for removing hftp, for >example. It's certainly something to think about. > >best, >Colin > > >On Thu, Apr 28, 2016, at 17:55, 何天一 wrote: >> Hey guys, >> >> My associates have investigated HDFS federation recently, which, turns >> out >> to be a quite good solution for improving scalability on >> NameNode/DataNode >> side. >> >> However, we encountered some problem on client-side. Since: >> A) For historical reason, we use clients in multiple languages to access >> HDFS, (i.e. python-snakebite, or perhaps libhdfs++). So we either >> implement >> multiple versions of ViewFS or we give up the consistency view (which >>can >> be confusing to user). >> B) We have hadoop client configuration deployed on client nodes, which >>we >> do not have control over . Also, releasing new configuration could be a >> real heavy operation because it needs to be pushed to several thousand >>of >> nodes, as well as maintaining consistency (say a node is down throughout >> the operation, then come back online. it could still possess a stale >> version of configuration). >> >> So we intended to explore another solution to these problems, and came >>up >> with a proxy model. >> That is, build a RPC proxy in front of NameNodes. >> All clients talk to proxy when they need to consult NameNode, then proxy >> decide which NameNode should the request go to according to mount table. >> This solved our problem. All clients are seamlessly upgraded with >> federation support. >> We open sourced the proxy recently: https://github.com/bytedance/nnproxy >> (BTW, all kinds of feedbacks are welcomed) >> >> But there are still a few issues. For example, several modifications >> needs >> to be done inside hadoop ipc to support rpc forwarding. We released >>patch >> according to which with nnproxy project ( >> https://github.com/bytedance/nnproxy/tree/master/hadoop-patches). But it >> could be better to have these merged to apache trunk. Does someone think >> it's worth? >> >> >> -- >> Cheers, >> Tianyi HE >> (+86) 185 0042 4096 > >--------------------------------------------------------------------- >To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org >For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org