Hi All, I'm going to be in the San Fran area the 6,7,8 September and would love the chance to meet with some of the HBase users, developers if anyone is interested?
I work with a Global Biodiversity Information network (GBIF) that has several thousand databases publishing data using well defined XML standards. We crawl and build an index of this information that currently resides in Mysql, and has 180million records in each of the 2 largest tables; and we are outgrowing mysql. We already use Hadoop to do various processes, but are about to try HBase as the backend store after doing various tests recently. We are not a huge cluster (16 nodes) but I think we are a nice case study, and because we are able to document openly and freely it could be something to reference from the HBase wikis. We are building search indexes, running statistical reports, annotating records (geocoding, quality control) creating maps (e.g. tile layers) etc so the output is technically quite interesting and the data is used for all kinds of scientific analysis. All our code is open source. We are a small team (3-4 developers) so would very much like the opportunity to pick the brains of others. We are keen to help improve HBase as well; probably more in the testing capacity than committing due to our workloads, but will do whatever we can. Cheers, Tim
