Himanshu, Seems like you might have an interest in using Coprocessors to do stuff like low-latency aggregates. This is a big area of interest for some of us but not a lot of concerted effort in this direction yet. There is plenty to do here for a research project.
Check out: https://issues.apache.org/jira/browse/HBASE-2000 And specifically: https://issues.apache.org/jira/browse/HBASE-1512 JG > -----Original Message----- > From: Himanshu Vashishtha [mailto:vashishth...@gmail.com] > Sent: Thursday, August 19, 2010 11:30 AM > To: dev@hbase.apache.org > Cc: u...@hbase.apache.org > Subject: Re: HBase: project ideas > > Hello Stack, > Thanks for the reply. please see inline. > > Cheers, > Himanshu > > On Thu, Aug 19, 2010 at 11:22 AM, Stack <st...@duboce.net> wrote: > > > On Thu, Aug 19, 2010 at 2:47 AM, Himanshu Vashishtha > > <vashishth...@gmail.com> wrote: > > > Dear All: > > > I have been looking around HBase (running/debugging it, etc) for a > couple > > of > > > weeks now, and it is fascinating. I am in search of a good project > for my > > > grad studies, focussing around HBase, but am not able to finalize > it. I > > am > > > looking for some project idea that I can use. It can be user or a > dev > > > project, I am open to all :) > > > > > > One idea (user specific) is to migrate a XQuery like tool that uses > > > relational db schema (there are bunch of papers suggesting it) to > HBase, > > but > > > I don't sure whether it is really a judicial use of HBase. Please > > suggest. > > > > > > > > > > Hello Himanshu. > > > > Its hard to make suggestion when I've no clue as to your interests. > > > Hadoop fascinates me. I wrote a tool for my lab which indexes a given > document collection (of plain text files) and then user can query it > from > four predefined operations... I store those indexes on HDFS using > Mapfiles(to reduce the request-response latency). > > Can you cite some of the papers you mention? > > So, I want to carry it forward for XML, and I came across two > approaches: > > indexing the doc, OR storing them in a rdbms style while also > considering > > schema info. > > > Paper ( for index based approach): An efficient inverted index > technique for > XML documents using RDBMS, Chiyoung Seo, others..2003. > > and for rdbms approach: *A Comprehensive XQuery* to *SQL* Translation > using > Dynamic Interval Encoding. David DeHaan, David Toman, Mariano P. > Consens, > others... in 2003, and its references. > > I developed a prototype for the index based one in HBase, but it is > limited > in usage (due to its inherent approach of indexing, you can't fire > elegant > operations like summing, grouping etc). Its quite raw. > > + Have you looked at HIVE? It might be more pertinent making this run > > better atop hbase rather than making a new XQuery-like tool for > hbase. > > > > Not yet. I read that it runs a MR job for every query, and it kind of > slows > its response time, so I skipped it past. But yes, it does provides lot > of > relational schema stuff I see. > > > + Build an app that allows various kind of location queries using > > geohashing+hbase combo. There's a few fellas floating on the list > who > > might be able to help you out on this project. > > > > For extra points, whatever you do, build it using hbase-2000 > coprocessors. > > I am sorry I couldn't get this. > > > > > > Thanks for writing the list Himanshu. > > St.Ack > >