What do you mean by "very large"? One possible source of performance concern is HBase RPC does not do positioned/chunked/partial reads, so both on the RegionServer and client the entirety of value data will be in the heap. A lot of really large objects brought in this way under high concurrency can cause excessive GC from fragmentation or OOME conditions if the heap isn't adequately sized. The recommendation of ~10 MB max is to mitigate these effects. There is nothing scientific about that number though, it's a rule of thumb, I've built HBase applications with a max value size of 100 MB and it performed adequately. (Larger objects were split into 100 MB chunks and keyed as $rowkey$chunk where $chunk was an integer serialized with Bytes.toInt()).
Another is a consequence of the fact a row cannot be split. This means that if the data in a single row grows significantly larger than the region split threshold, you will have this one region sized differently from the others, and this can lead to unexpected behavior. Consider if the split threshold is 2 GB but your one row contains 10 GB as really large value. This is undesirable because HBase expects housekeeping on a given region to be more or less equal to others: compaction, etc. >From the application POV, if you have a few really big value size outliers, then these could be like land mines if the app is short scanning over table data. Gets or Scans including such values will have widely varying latency from others. But this would be an application design problem. On Sun, Jan 6, 2013 at 12:28 PM, Asaf Mesika <[email protected]> wrote: > What's the penalty performance wise of saving a very large value in a > KeyValue in hbase? Splits, scans, etc. > > Sent from my iPad > > On 6 בינו 2013, at 22:12, Andrew Purtell <[email protected]> wrote: > > > Also YFrog / ImageShack serves all of its assets out of HBase too, so for > > reasonably sized images some are having success. See > > http://www.slideshare.net/jacque74/hug-hbase-presentation > > > > > > On Sun, Jan 6, 2013 at 3:58 AM, Yusup Ashrap <[email protected]> wrote: > > > >> there are a lot great discussions on Quora on this topic. > >> > >> > http://www.quora.com/Apache-Hadoop/Is-HBase-appropriate-for-indexed-blob-storage-in-HDFS > >> http://www.quora.com/Is-it-possible-to-use-HDFS-HBase-to-serve-images > >> > >> > http://www.quora.com/What-is-a-good-choice-for-storing-blob-like-files-in-a-distributed-environment > >> > > > > > > > > -- > > Best regards, > > > > - Andy > > > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > > (via Tom White) > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
