All, I would like to gauge the interest in an extension to Accumulo to enable geospatial capabilities. Currently, I have developed a schema for storing raster data as tiles in Accumulo and a plugin to Geoserver that allows Accumulo tables that use the specified schema to be exposed as WMS layers for importing into a GIS. This is a natural fit for Accumulo since the individual tiles are not large but the aggregate set of tiles that make up a single layer can become very large. Accumulo packages those tiles into blocks and distributes them around the cloud for quick access and redundant storage. The implementation is in an early state.
I am currently investigating the feasibility of implementing an API for storing, querying, and processing vector data in Accumulo. I would like the API to be able to answer nearest neighbor queries, perform on-the-fly reprojections for queries that come in in a particular projection, various standard geospatial transformations such as buffering and finding intersections, etc. My current thought is that the approach would be similar to how PostGIS extends Postgres in that it dictates a schema and storage format and then provides a user level api (a bunch of sql functions) for processing that data. PostGIS also provides an r-tree index implemented on top of GiST to enable geospatial querying. This type of functionality is also a natural fit for Accumulo as r-tree minimum bounding rectangles can map to tablet extents. However, this change would require modifications to core functionality. Some mechanism for hooking in alternative 'extents' may be a technique for dealing with this kind of indexing scheme. Is there any interest in these kinds of geospatial processing capabilities in the Accumulo community and has anyone thought about/implemented some geospatial functions? Thanks, Anthony
