Hi folks, I have just joined the list with the purpose of volunteering ideas, design and development (and whatever else in lifecycle) related to development of the Python client for accumulo.
I have developed several RESTful clients and libraries before using web.py and I am about to write another in Tornado (http://www.tornadoweb.org/). I think that we could have a very nice, scalable and fast RESTful API for Accumulo through Tornado. I would also like to develop pure Python library for accumulo similar to HappyBase for HBase (https://github.com/wbolster/happybase). I work at Oak Ridge National Lab as a software engineer and tech. lead on "big data" projects, I can devote time, possibly bring more team members and I would be happy to collaborate. Collaborations are welcome. I could certainly start a small wiki outlining the ideas and open them for discussion. Regards and please advise, Edmon On Wed, May 2, 2012 at 11:31 AM, Jason Trost <[email protected]> wrote: > I noticed that there are no JIRAs for a python client > interface/lib/API for Accumulo. How involved would it be to develop > AND maintain a python client for Accumulo? > > I realize that Jython can be used, but I am interested in a native > python lib that can be use more broadly with systems that don't work > with Jython. > > In order to do this, it seems like we would need to: > 1. generate the python thrift bindings code (this is trivial) > 2. develop and maintain the python glue code to use the thrift code > and python zookeeper code to interact with the various accumulo > components. The current Java "glue" code looks quite long. How often > does this code change (in terms of new features or changes in > protocol, not bug fixes)? > I would advise against rewriting the accumulo client code in python. The code that finds tablets, retries in case of failure, parallelizes read/writes, etc is fairly complex. I think the proxy option is best. David and Eric mentioned REST and Thrift proxies. If we were to go to down the route of writing the client code in another language, I think C++ with a C API would be the best option because many language can easily bind to a C API. > Ideally the python API would be very similar to the Java interface > (Connector, Instance, Scanner, BatchScanner, BatchWriter, Key, Value, > Mutation, etc). > > I guess what I am trying to get at is, does the Accumulo dev community > think it's worth the time and effort to develop and maintain a python > API? I personally think it is in order to help with adoption and > integration with other systems (Django is the primary system I want to > be able to use with it). I have some time to help this along, but I > don't think I have enough time to take this on alone. Is anyone else > interested in working together on this? > > Thanks, > > --Jason
