Hey Billie, A word of warning on contribs: one thing to be wary of is the "drive by contribution". We found in Hadoop and HBase that many contribs were added to Hadoop as part of a research project or other "passing interest", and then not maintained. Since the core committers had very little knowledge of the contrib components, and the authors were no longer actively maintaining them, they ended up as rotting appendages to our codebase. Users would run into issues and then we'd be unable to help them work through them - not good for anyone.
In HBase, we ended up ejecting our contribs to github. This worked out well - some have done OK, others have died off. But the ones that died off had no maintainers anyway - so better to let them die on their own than drag them forward unmaintained in SVN. We've always had the stance that, if an HBase-related project on github or elsewhere wants to enter contrib, then they can do so provided they have active maintainers who are truly committed to long term maintenance. For example, our REST server module graduated from contrib into a core part of our project, since its maintainers are also HBase committers who run the stuff in production. Not sure if this is "Apache-like" -- just my opinion as another developer. -Todd On Fri, Oct 28, 2011 at 1:52 PM, Billie J Rinaldi <[email protected]> wrote: > Anthony, > > It sounds interesting. I have been thinking about how to start fostering a > set of contrib projects for Accumulo, but am unsure how we would manage such > things effectively (e.g. how do we make sure they work? are they versioned > and released with Accumulo?). Perhaps we could begin to work this out with > your project. > > Billie > > > ----- Original Message ----- >> From: "Anthony Fox" <[email protected]> >> To: "Accumulo dev" <[email protected]> >> Sent: Wednesday, October 26, 2011 4:30:40 PM >> Subject: accumulo geo >> All, >> >> I would like to gauge the interest in an extension to Accumulo to >> enable >> geospatial capabilities. Currently, I have developed a schema for >> storing >> raster data as tiles in Accumulo and a plugin to Geoserver that allows >> Accumulo tables that use the specified schema to be exposed as WMS >> layers >> for importing into a GIS. This is a natural fit for Accumulo since the >> individual tiles are not large but the aggregate set of tiles that >> make up >> a single layer can become very large. Accumulo packages those tiles >> into >> blocks and distributes them around the cloud for quick access and >> redundant >> storage. The implementation is in an early state. >> >> I am currently investigating the feasibility of implementing an API >> for >> storing, querying, and processing vector data in Accumulo. I would >> like >> the API to be able to answer nearest neighbor queries, perform >> on-the-fly >> reprojections for queries that come in in a particular projection, >> various >> standard geospatial transformations such as buffering and finding >> intersections, etc. My current thought is that the approach would be >> similar to how PostGIS extends Postgres in that it dictates a schema >> and >> storage format and then provides a user level api (a bunch of sql >> functions) for processing that data. PostGIS also provides an r-tree >> index >> implemented on top of GiST to enable geospatial querying. This type of >> functionality is also a natural fit for Accumulo as r-tree minimum >> bounding >> rectangles can map to tablet extents. However, this change would >> require >> modifications to core functionality. Some mechanism for hooking in >> alternative 'extents' may be a technique for dealing with this kind of >> indexing scheme. >> >> Is there any interest in these kinds of geospatial processing >> capabilities >> in the Accumulo community and has anyone thought about/implemented >> some >> geospatial functions? >> >> Thanks, >> Anthony > -- Todd Lipcon Software Engineer, Cloudera
