I definetly like the idea.... Thanks for putting this together RJ.

- what  are the main use cases for webhdfs and how do people currently use it 
in the real world?

- what portions of the FileSystem and FileContext contract does webhdfs cover , 
and can we morph it's client , to make it hcfs compatible, and leverage our 
existing GlusterFS-hadoop plugin ?

I can help mentor it from the perspective of the java integration and API 
usability, and I'm sure we can help to track down some folks on the C/gluster 
side of things is able to help me on the lower level details.  

> On Mar 18, 2014, at 9:20 PM, RJ Nowling <rnowl...@gmail.com> wrote:
> 
> Hi all,
> 
> I wanted to follow up.  I drafted a proposal for creating a RESTful/JSON API 
> and server for GlusterFS similar to WebHDFS.  As the number of big data 
> processing and storage systems explode, integration is becoming more 
> important.  A language and operating system agnostic RESTful/JSON API and 
> server could be helpful for easing integration efforts.
> 
> I've pasted the proposal below.  Is there is any interest in the Gluster 
> community?  Would anyone be willing to server as a mentor?
> 
> Thank you,
> RJ
> 
> RESTful/JSON API and Server for GlusterFS
> 
> Overview of proposal:
> The goal of the proposal is to create a RESTful/JSON API and server (similar 
> to WebHDFS) for GlusterFS. 
> 
> Need it fulfills:
> Following on the popularity of Hadoop, a number of "big data" processing 
> systems (e.g., Berkeley Data Analytics Stack, Storm, Stratophere, Disco) are 
> being created and adopted.  These systems are written in a wide range of 
> languages such as Java, Scala, Python, and Erlang.  
> 
> These systems are rarely used in isolation. Maintaining separate distributed 
> file systems and databases is laborious, costly, and wasteful. Migrating data 
> between separate distributed file systems or databases is difficult, error 
> prone, and limits easy access to data when it is needed. As a result, there 
> is great interest in integration as exemplified by projected such as the 
> Gluster plugin for Hadoop.
> 
> Gluster's existing clients (FUSE, libgfapi) are limited to specific operating 
> systems (Linux) and/or require bindings for each programming language other 
> interest.  Such RESTful/JSON APIs and servers such as WebHDFS offer a more 
> general solution that is independent of the client's operating system and 
> programming language.  WebHDFS has proven popular and is being used by 
> systems such as Disco to add support HDFS.  A RESTful/JSON interface and 
> server for could offer similar benefits for Gluster and has the potential to 
> be just as popular as WebHDFS. 
> 
> Any relevant experience you have:
> I am familiar with WebHDFS and Hadoop Gluster plugin. Through my Ph.D. 
> research and TA'ing experience, I am familiar with distributed systems (e.g., 
> WorkQueue), client-server systems, and RESTful/JSON APIs.  I have some 
> experience with CherryPy, a Python web service framework, and using it to 
> create a RESTful/JSON servers. I am also familiar with the work in Disco to 
> add HDFS support through WebHDFS.
> 
> How you intend to implement your proposal:
> Aim 1: Design a RESTful/JSON interface that supports the semantics of Gluster.
> The ability to report data locality information will be important for other 
> projects that use that information for scheduling workers and tasks.
> 
> Aim 2: Create a RESTful/JSON server.
> I will use Python and its libraries such as CherryPy or Flask to develop a 
> RESTful server. My preferred option will be to use Python bindings to 
> libgfapi as a backend, but I will fall back to using the Gluster FUSE client 
> if I run into problems.  A dummy backend that uses the local file system will 
> be created for testing purposes. (It would be good to support multiple 
> backends.)  
> 
> Aim 3: Create a RESTful/JSON Python library.
> I will create Python library that uses the RESTful/JOSN interface as a 
> backend.
> 
> Aim 4: Create Unit Tests and Benchmarks for Several Use Cases
> As part of my effort, I will write unit tests to ensure that the server and 
> client library are implemented correctly.  As a good performance will be 
> important for adoption, I will also document several use cases and perform 
> benchmarks to evaluate the performance of the RESTful/JSON server compared 
> with the standard FUSE client. 
> 
> Aim 5: (Optional and time permitting) Work on integration with a big data 
> system a proof-of-concept
> Option 1: Integrate with Hadoop by mimicking the WebHDFS API so that the 
> Hadoop WebHDFS client can transparently use the Gluster RESTful API as a 
> backend
> 
> Option 2: Integrate with the Disco as an Erlang/Python MapReduce framework.  
> Support for HDFS is currently being added using the WebHDFS interface.  The 
> WebHDFS work provides a good template for adding Gluster support.
> 
> -- 
> em rnowl...@gmail.com
> c 954.496.2314
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel

Reply via email to