I definetly like the idea.... Thanks for putting this together RJ. - what are the main use cases for webhdfs and how do people currently use it in the real world?
- what portions of the FileSystem and FileContext contract does webhdfs cover , and can we morph it's client , to make it hcfs compatible, and leverage our existing GlusterFS-hadoop plugin ? I can help mentor it from the perspective of the java integration and API usability, and I'm sure we can help to track down some folks on the C/gluster side of things is able to help me on the lower level details. > On Mar 18, 2014, at 9:20 PM, RJ Nowling <rnowl...@gmail.com> wrote: > > Hi all, > > I wanted to follow up. I drafted a proposal for creating a RESTful/JSON API > and server for GlusterFS similar to WebHDFS. As the number of big data > processing and storage systems explode, integration is becoming more > important. A language and operating system agnostic RESTful/JSON API and > server could be helpful for easing integration efforts. > > I've pasted the proposal below. Is there is any interest in the Gluster > community? Would anyone be willing to server as a mentor? > > Thank you, > RJ > > RESTful/JSON API and Server for GlusterFS > > Overview of proposal: > The goal of the proposal is to create a RESTful/JSON API and server (similar > to WebHDFS) for GlusterFS. > > Need it fulfills: > Following on the popularity of Hadoop, a number of "big data" processing > systems (e.g., Berkeley Data Analytics Stack, Storm, Stratophere, Disco) are > being created and adopted. These systems are written in a wide range of > languages such as Java, Scala, Python, and Erlang. > > These systems are rarely used in isolation. Maintaining separate distributed > file systems and databases is laborious, costly, and wasteful. Migrating data > between separate distributed file systems or databases is difficult, error > prone, and limits easy access to data when it is needed. As a result, there > is great interest in integration as exemplified by projected such as the > Gluster plugin for Hadoop. > > Gluster's existing clients (FUSE, libgfapi) are limited to specific operating > systems (Linux) and/or require bindings for each programming language other > interest. Such RESTful/JSON APIs and servers such as WebHDFS offer a more > general solution that is independent of the client's operating system and > programming language. WebHDFS has proven popular and is being used by > systems such as Disco to add support HDFS. A RESTful/JSON interface and > server for could offer similar benefits for Gluster and has the potential to > be just as popular as WebHDFS. > > Any relevant experience you have: > I am familiar with WebHDFS and Hadoop Gluster plugin. Through my Ph.D. > research and TA'ing experience, I am familiar with distributed systems (e.g., > WorkQueue), client-server systems, and RESTful/JSON APIs. I have some > experience with CherryPy, a Python web service framework, and using it to > create a RESTful/JSON servers. I am also familiar with the work in Disco to > add HDFS support through WebHDFS. > > How you intend to implement your proposal: > Aim 1: Design a RESTful/JSON interface that supports the semantics of Gluster. > The ability to report data locality information will be important for other > projects that use that information for scheduling workers and tasks. > > Aim 2: Create a RESTful/JSON server. > I will use Python and its libraries such as CherryPy or Flask to develop a > RESTful server. My preferred option will be to use Python bindings to > libgfapi as a backend, but I will fall back to using the Gluster FUSE client > if I run into problems. A dummy backend that uses the local file system will > be created for testing purposes. (It would be good to support multiple > backends.) > > Aim 3: Create a RESTful/JSON Python library. > I will create Python library that uses the RESTful/JOSN interface as a > backend. > > Aim 4: Create Unit Tests and Benchmarks for Several Use Cases > As part of my effort, I will write unit tests to ensure that the server and > client library are implemented correctly. As a good performance will be > important for adoption, I will also document several use cases and perform > benchmarks to evaluate the performance of the RESTful/JSON server compared > with the standard FUSE client. > > Aim 5: (Optional and time permitting) Work on integration with a big data > system a proof-of-concept > Option 1: Integrate with Hadoop by mimicking the WebHDFS API so that the > Hadoop WebHDFS client can transparently use the Gluster RESTful API as a > backend > > Option 2: Integrate with the Disco as an Erlang/Python MapReduce framework. > Support for HDFS is currently being added using the WebHDFS interface. The > WebHDFS work provides a good template for adding Gluster support. > > -- > em rnowl...@gmail.com > c 954.496.2314 > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@nongnu.org > https://lists.nongnu.org/mailman/listinfo/gluster-devel
_______________________________________________ Gluster-devel mailing list Gluster-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/gluster-devel