That link is a pictorial view of what is represented in the HFile. My limited understanding is what is actually written in the HFile in terms of bytes is on a row by row basis, but you are not going to need to get into HFiles.
Cheers, Tim On Fri, Jul 31, 2009 at 10:05 AM, Angus He<[email protected]> wrote: > OK,OK,OK. > > If data is stored row-by-row in hbase, how could you explain the text > under section "Physical Storage View" in > http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture. > Is the page stale or something else wrong? > > On Fri, Jul 31, 2009 at 3:50 PM, Ryan Rawson<[email protected]> wrote: >> Data is stored row-by-row in the hbase store files (aka hfiles). >> HBase is not a column-oriented-store as described in the wikipedia >> article: http://en.wikipedia.org/wiki/Column-oriented_DBMS >> >> Have a look at the bigtable paper, do some searches, lots of material >> out there describing the benefits of a flexible store like >> bigtable/hbase. >> >> -ryan >> >> >> >> On Fri, Jul 31, 2009 at 12:42 AM, Angus He<[email protected]> wrote: >>> Hi Ryan, >>> >>> You cannot equate the "column" in that article of wikipedia to the >>> "column" in HBase. >>> >>> We should assume that the word "column" in "column-oriented" is >>> predefined, otherwise, it is meaningless. >>> >>> So we should consider the "column" in wikipedia as "column-family" in >>> HBase. In this way, the article can answer 宏明's question. >>> >>> >>> On Fri, Jul 31, 2009 at 3:18 PM, Ryan Rawson<[email protected]> wrote: >>>> Hey, >>>> >>>> The bigtable paper talks more about column families, but in HBase each >>>> column family is stored in it's own file. That means there is disk >>>> locality for different column families. The canonical use is to put >>>> web crawl data in one family, and meta data (like derived meta data) >>>> in another. That way scanning just the meta data is not as expensive >>>> as scanning the web page crawl dump. >>>> >>>> Column families are pre-defined - the "schema" for what it's worth - >>>> but the 'qualifier' within a family is dynamically determined by the >>>> client. >>>> >>>> In the terminology of the article, hbase would be more 'row oriented', >>>> but with the column family snag, it isnt that simple. Since rows from >>>> different families are stored in different files, reading efficiency >>>> is related to which column families you are reading in a query. >>>> >>>> -ryan >>>> >>>> On Fri, Jul 31, 2009 at 12:02 AM, Angus He<[email protected]> wrote: >>>>> Hi Ryan, >>>>> >>>>> 1. If it is not the case , what is the purpose of introduction of >>>>> "column family"? >>>>> Does the contents from different column family stored in different >>>>> files in HBase? >>>>> >>>>> BTW, in the bigtable paper, we can find the following text: >>>>> "Access control and both disk and memory accounting are performed at >>>>> the column-family level." >>>>> >>>>> 2. I was wondering if HBase shares the benefits described in the >>>>> "Benefits" sections of wikipedia article. If not, what is the meaning >>>>> of "column-stores" in HBase? >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Jul 31, 2009 at 2:30 PM, Ryan Rawson<[email protected]> wrote: >>>>>> HBase and bigtable are referred to column-stores, but we arent a >>>>>> 'column oriented dbms' as described in the wikipedia. >>>>>> >>>>>> At the storage level, hbase stores key-values, where the key is a >>>>>> triple of row / column / timestamp. Files are ordered lists of these >>>>>> key/values, and they are sorted in that order, hence rows are stored >>>>>> together, then sorted by column then reverse by timestamp (newest on >>>>>> top). >>>>>> >>>>>> Thus hbase is not a 'column store' in the sense listed in the wikipedia >>>>>> entry. >>>>>> >>>>>> On Thu, Jul 30, 2009 at 11:23 PM, Angus He<[email protected]> wrote: >>>>>>> Why don't you try to google it first? >>>>>>> After googling with the keyword "Column-oriented", the first result is >>>>>>> exactly what you want. >>>>>>> http://en.wikipedia.org/wiki/Column-oriented_DBMS >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2009/7/31 <[email protected]>: >>>>>>>> Hi, >>>>>>>> Does anyone can tell me the benefit of Column-oriented data modal? >>>>>>>> Thank you >>>>>>>> >>>>>>>> Fleming >>>>>>>> 宏明 >>>>>>>> --------------------------------------------------------------------------- >>>>>>>> TSMC PROPERTY >>>>>>>> This email communication (and any attachments) is proprietary >>>>>>>> information >>>>>>>> for the sole use of its >>>>>>>> intended recipient. Any unauthorized review, use or distribution by >>>>>>>> anyone >>>>>>>> other than the intended >>>>>>>> recipient is strictly prohibited. If you are not the intended >>>>>>>> recipient, >>>>>>>> please notify the sender by >>>>>>>> replying to this email, and then delete this email and any copies of >>>>>>>> it >>>>>>>> immediately. Thank you. >>>>>>>> --------------------------------------------------------------------------- >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Regards >>>>>>> Angus >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Regards >>>>> Angus >>>>> >>>> >>> >>> >>> >>> -- >>> Regards >>> Angus >>> >> > > > > -- > Regards > Angus >
