HBase does at least 3 things that traditional databases have a hard time with:
- Large blobs of data. Mysql is particularly guilty of not handling this well. - Tables that grow to be larger than reasonably priced single machines. - Write loads that are not compatible with master-slave replication The 2nd and 3rd are very interesting, since you either have to pay for something like Oracle RAC, or start sharding. On Thu, Nov 12, 2009 at 5:58 PM, Imran M Yousuf <[email protected]> wrote: > On Thu, Nov 12, 2009 at 10:50 PM, Chris Bates > <[email protected]> wrote: >> Hi Imran, >> >> I'm a new user as well. I found these presentations helpful in answering >> most of your questions: >> http://wiki.apache.org/hadoop/HBase/HBasePresentations >> >> There are HBase schema designs in there. >> > > I read them, but without the speakers explanation the schema parts > remain unexplained for a dumb newbie like me. I was looking for more > concrete definitions of column family, column, cell etc. and their use > cases. I guess I will have to learn them by experimenting. > >> You might also want to read the original BigTable paper and the chapter on >> HBase in OReilly's Hadoop book. >> >> But to answer one of your questions--"Big Data" usually refers to a dataset >> that is millions to billions in length. But "Big Data" doesn't mean you >> have to use a tool like HBase. We have some MySQL tables that are 100 >> million rows and work fine. You have to identify what works best for your >> use and use the most appropriate tool. > > Thanks, IMHO, I am sure that HBase is more suitable than MySQL simply > because of the complexity and cost in scaling an application with Blob > data. > > Thanks a lot, > > Imran > >> >> On Thu, Nov 12, 2009 at 9:13 AM, Imran M Yousuf <[email protected]> wrote: >> >>> Hi! >>> >>> I am absolutely new to HBase. All I have done is to read up >>> documentation, presentation and getting a single instance up and >>> running. I am starting on a Content Management System which will be >>> used as a backend for multiple web applications of different natures. >>> In the CMS: >>> * User can define their content known as content type. >>> * Content can have one-2-many one-2-one and many-2-many relationship >>> with other contents. >>> * Content fields should be versioned >>> * Content type can change in runtime, i.e. fields (a.k.a. columns in >>> HBase) added and removal will not be allowed just yet. >>> * Every content type will have a corresponding grammer to validate >>> content of its type. >>> * It will have authentication and authorization >>> * It will have full text search based on Lucene/Katta. >>> >>> Based on these requirements I have the following questions that I >>> would like feedback on: >>> * Reading articles and presentations it looks to be HBase is a perfect >>> match as it supports multi-dimensional rows, versioned cells, dynamic >>> schema modification. But I could not understand what is the definition >>> of "Big Data" - that is if a content size is roughly 1~100kB >>> (field/cell size 0~100kB), is HBase meant for such uses? >>> * Since I am not sure how much load the site will have, I am planning >>> to setup DN+RS on Rackspace cloud instances with 2GB/80GB HDD with a >>> view of with revenue and pageviews increasing, more moderate >>> "commodity" hardware can be added progressively. Any >>> comments/suggestions on this strategy? >>> * Where can I read up on or checkout samples RDBMS schemas converted >>> to HBase schema? Basically, I want to read up efficient schema design >>> for different cardinal relationships between objects. >>> >>> Thank you, >>> >>> -- >>> Imran M Yousuf >>> Entrepreneur & Software Engineer >>> Smart IT Engineering >>> Dhaka, Bangladesh >>> Email: [email protected] >>> Blog: http://imyousuf-tech.blogs.smartitengineering.com/ >>> Mobile: +880-1711402557 >>> >> > > > > -- > Imran M Yousuf > Entrepreneur & Software Engineer > Smart IT Engineering > Dhaka, Bangladesh > Email: [email protected] > Blog: http://imyousuf-tech.blogs.smartitengineering.com/ > Mobile: +880-1711402557 >
