Re: accumulo geo

Todd Lipcon Fri, 28 Oct 2011 14:07:37 -0700

Hey Billie,

A word of warning on contribs: one thing to be wary of is the "drive
by contribution". We found in Hadoop and HBase that many contribs were
added to Hadoop as part of a research project or other "passing
interest", and then not maintained. Since the core committers had very
little knowledge of the contrib components, and the authors were no
longer actively maintaining them, they ended up as rotting appendages
to our codebase. Users would run into issues and then we'd be unable
to help them work through them - not good for anyone.


In HBase, we ended up ejecting our contribs to github. This worked out
well - some have done OK, others have died off. But the ones that died
off had no maintainers anyway - so better to let them die on their own
than drag them forward unmaintained in SVN. We've always had the
stance that, if an HBase-related project on github or elsewhere wants
to enter contrib, then they can do so provided they have active
maintainers who are truly committed to long term maintenance. For
example, our REST server module graduated from contrib into a core
part of our project, since its maintainers are also HBase committers
who run the stuff in production.

Not sure if this is "Apache-like" -- just my opinion as another developer.

-Todd

On Fri, Oct 28, 2011 at 1:52 PM, Billie J Rinaldi
<[email protected]> wrote:
> Anthony,
>
> It sounds interesting.  I have been thinking about how to start fostering a 
> set of contrib projects for Accumulo, but am unsure how we would manage such 
> things effectively (e.g. how do we make sure they work?  are they versioned 
> and released with Accumulo?).  Perhaps we could begin to work this out with 
> your project.
>
> Billie
>
>
> ----- Original Message -----
>> From: "Anthony Fox" <[email protected]>
>> To: "Accumulo dev" <[email protected]>
>> Sent: Wednesday, October 26, 2011 4:30:40 PM
>> Subject: accumulo geo
>> All,
>>
>> I would like to gauge the interest in an extension to Accumulo to
>> enable
>> geospatial capabilities. Currently, I have developed a schema for
>> storing
>> raster data as tiles in Accumulo and a plugin to Geoserver that allows
>> Accumulo tables that use the specified schema to be exposed as WMS
>> layers
>> for importing into a GIS. This is a natural fit for Accumulo since the
>> individual tiles are not large but the aggregate set of tiles that
>> make up
>> a single layer can become very large. Accumulo packages those tiles
>> into
>> blocks and distributes them around the cloud for quick access and
>> redundant
>> storage. The implementation is in an early state.
>>
>> I am currently investigating the feasibility of implementing an API
>> for
>> storing, querying, and processing vector data in Accumulo. I would
>> like
>> the API to be able to answer nearest neighbor queries, perform
>> on-the-fly
>> reprojections for queries that come in in a particular projection,
>> various
>> standard geospatial transformations such as buffering and finding
>> intersections, etc. My current thought is that the approach would be
>> similar to how PostGIS extends Postgres in that it dictates a schema
>> and
>> storage format and then provides a user level api (a bunch of sql
>> functions) for processing that data. PostGIS also provides an r-tree
>> index
>> implemented on top of GiST to enable geospatial querying. This type of
>> functionality is also a natural fit for Accumulo as r-tree minimum
>> bounding
>> rectangles can map to tablet extents. However, this change would
>> require
>> modifications to core functionality. Some mechanism for hooking in
>> alternative 'extents' may be a technique for dealing with this kind of
>> indexing scheme.
>>
>> Is there any interest in these kinds of geospatial processing
>> capabilities
>> in the Accumulo community and has anyone thought about/implemented
>> some
>> geospatial functions?
>>
>> Thanks,
>> Anthony
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: accumulo geo

Reply via email to