[ 
https://issues.apache.org/jira/browse/HBASE-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13596899#comment-13596899
 ] 

eric baldeschwieler commented on HBASE-8016:
--------------------------------------------

Hi Stack

MiniHBaseCluster sounds like a good way to prototype this.  I've been kicking 
this around w ddas and serge and they suggested that.  

Hi Matt

Noooo .. that would be an interesting project but is going a very different 
direction.

Hi Andrew

LevelDB is the other thing I've been thinking about.  We may to some 
comparison.  But adapting it to use HDFS efficiently may prove non-trivial and 
I'd want something that can handle a couple of TB of data, not clear level DB 
fits that bill.

In terms of distributed data store...  we definitely suffer from not having 
good simple mechanisms to add good state management of large data sets to 
simple apps around hadoop.  Often they have a single master and managable data 
rates.  They are getting built on DBs today, but that really is crufty.  I'm 
looking for a repeatable data management design that doesn't bring all the fun 
of Admining either a high availablity RDMS or distributed NoSQL store into the 
mix.

Other approaches might be to hack up derbe or SqlLite or postgres, but all of 
these bring more bagage since thet are not already HDFS native.  And none 
should scale as well as HBase.
                
> HBase as an embeddable library, but still using HDFS
> ----------------------------------------------------
>
>                 Key: HBASE-8016
>                 URL: https://issues.apache.org/jira/browse/HBASE-8016
>             Project: HBase
>          Issue Type: Wish
>            Reporter: eric baldeschwieler
>
> This goes in the "strange idea" bucket...  
> I'm looking for a tool to allow folks to store key-value data into HDFS so 
> that hadoop companion layers & apps don't need to rely either on external 
> database or a NoSQL store.  HBase itself is often not running on such 
> clusters and we can not add it as a requirement for many of the use cases I'm 
> considering.
> But...  what if we produced a library that provided the basic HBase API 
> (creating tables & putting / getting values...) and this library was pointed 
> at HDFS for durability.  This library would effectively embed a region server 
> and the the master in a node and provide only API level access within that 
> JVM.  We would skip marshaling & networking, gaining a fair amount of 
> efficiency.  An application using this library would gain all of the 
> advantages of HBase without adding any additional administrative complexity 
> of managing HBase as a distributed service.
> Thoughts?
> Example use cases...  Right now a typical hadoop install runs serval services 
> that use databases (Oozie, HCat, Hive ...).  What if some of these could be 
> ported to use HDFS itself as their store with the HBase API provided to 
> manage their data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to