Thanks for the responses... But I want to emphasize that... we are dealing with a relatively broad base here... around 150 million rows... And quite a few columns...few hundred perhaps... Therefore, I am a bit apprehensive with writing a fresh piece here, for which I would need to do whole lot of testing...
Preferably, I am looking for an existing piece, for which some testing would have happened already... if there is one like that.. @Jack - when writing your application in C++, what API did you use there ? the standard hbase one ?? how was the experience ? can you please share a bit more... Regards Raghav.. -----Original Message----- From: Jack Levin [mailto:[email protected]] Sent: Monday, September 27, 2010 10:52 AM To: [email protected] Subject: Re: Fast querying mechanism for hbase data ? You could just write an application in any language that would query your rows, put them in memory, then do any sort of sorting or processing. Use REST api, and you are done. We did an experiment of sorting/querying by using C++, and it was quite impressive with 10k rows. -Jack On Sun, Sep 26, 2010 at 9:57 PM, Imran M Yousuf <[email protected]> wrote: > Hi Raghav, > > You could try Apache Solr along with HBase. Apache Solr is designed > for Full Text search and works in various modes in terms of storing > indexes. > http://lucene.apache.org/solr/ > http://github.com/akkumar/hbasene [Provides a distributed system to > use HBase as the backing store for the TF-IDF representation, as > needed by Lucene] > http://www.lilyproject.org/lily/index.html [Cloud-scalable NoSQL-based > content store and search repository, built on top of Apache HBase and > SOLR] > > If your requirement is not real-time in nature you may also try the > Scanner API of HBase Client. > http://hbase.apache.org/docs/r0.89.20100726/apidocs/index.html > > Regards, > > Imran > > On Mon, Sep 27, 2010 at 10:27 AM, Sharma, Raghvendra > <[email protected]> wrote: >> I am running a little test/poc here. >> >> I need to load a few million rows every day into a database. And it's not >> log file data, I have comma delimited rows (of columns) which would exactly >> fit a relational database. >> >> After the loading, I need to allow a very fast search mechanism. Looking a >> bit at Google's implementation of bigtable and structure around it, I >> originally thought of using hive integrated with hbase. Hive because of its >> querying capabilities. The loading works out fine, better than RDBMS perf. >> However, the querying bottleneck, which was the reason to look for >> alternatives to RDBMS in the first place, continues with hive too. >> >> Testing hive for querying is not really blazing performance. Perhaps I need >> to look for alternatives.. >> >> Is there something else ? any other tool/solution/library that I can put on >> top of hbase ? or even without hbase ? (I looked at hbase as an alternative >> to the RDBMS, moving towards dist computing) >> >> Suggestions please... >> >> --raghav.. >> ****************************************************************************************** >> This message may contain confidential or proprietary information intended >> only for the use of the >> addressee(s) named above or may contain information that is legally >> privileged. If you are >> not the intended addressee, or the person responsible for delivering it to >> the intended addressee, >> you are hereby notified that reading, disseminating, distributing or copying >> this message is strictly >> prohibited. If you have received this message by mistake, please immediately >> notify us by >> replying to the message and delete the original message and any copies >> immediately thereafter. >> >> Thank you. >> ****************************************************************************************** >> CLLD >> > > > > -- > Imran M Yousuf > Entrepreneur & CEO > Smart IT Engineering Ltd. > Dhaka, Bangladesh > Twitter: @imyousuf - http://twitter.com/imyousuf > Blog: http://imyousuf-tech.blogs.smartitengineering.com/ > Mobile: +880-1711402557 >
