Hi jack, thank you for sharing!
Hello Andrew, You mentioned an interesting topic, which is cache. My question is why I need cache between HBase and HDFS if I have cache configured between HBase and its caller application? Let's say I have an web application which use HBase as data source at the backend. I have cache configured in my reverse proxy which is at the front of my Web server. And the cache is configured based on URL pattern or parameters. In this case, the cached data will be delivered to the client if the input parameter/url is same. So, the same data cached behind Web server wil not be hitted. if this is the case, I will say the cache between HBase and HDFS will not be helpful. But, I think the real case should not as simple as I described as above. Can you please expand a little bit on the cache topic? thanks and regards, Yiyu On Mon, Jan 28, 2013 at 1:58 PM, Andrew Purtell <[email protected]> wrote: > If I were to design a large object store on HBase, I would do the > following: Under a threshold, store the object data into HBase. Over the > threshold, store metadata for the object only into HBase and the object > data itself into a file in HDFS. The threshold could be a fixed byte size > like 100 MB, or you could segment storage by MIME type, for example image/* > into HBase and video/* into HDFS. Video objects might be as large as 5-10 > GB, full length features, depending on encoding bitrate. HBase can pack > millions or billions of small objects into much larger indexed files that > can be quickly retrieved, and this helps avoid namespace pressures on the > HDFS NameNode. However, the HBase API cannot do positioned reads of partial > byte ranges of stored objects, while the HDFS API can. Put smaller objects > into HBase. Put larger objects into HDFS so you can stream them at > approximately the same rate that the end user reads them and minimize > overheads for server side buffering. As Jack mentions, there is Hoop ( > https://github.com/cloudera/hoop) or WebHDFS ( > http://hadoop.apache.org/docs/stable/webhdfs.html) for accessing HDFS via > a > RESTful API. Both will let you do positioned reads of partial byte ranges > out of HDFS. On the HBase side, is HBase's REST interface ( > http://wiki.apache.org/hadoop/Hbase/Stargate). Put a cache in between the > HDFS and HBase services and the front end because even with the > capabilities of HBase and HDFS you should always have a caching tier > between the datastore and the front end. > > > On Sun, Jan 27, 2013 at 8:56 AM, Jack Levin <[email protected]> wrote: > > > We did some experiments, open source project HOOP works well with > > interfacing to HDFS to expose REST Api interface to your file system. > > > > -Jack > > > > On Sun, Jan 27, 2013 at 7:37 AM, yiyu jia <[email protected]> wrote: > > > Hi Jack, > > > > > > Thanks so much for sharing! Do you have comments on storing video in > > HDFS? > > > > > > thanks and regards, > > > > > > Yiyu > > > > > > On Sat, Jan 26, 2013 at 9:56 PM, Jack Levin <[email protected]> wrote: > > > > > >> AFAIK, namenode would not like tracking 20 billion small files :) > > >> > > >> -jack > > >> > > >> On Sat, Jan 26, 2013 at 6:00 PM, S Ahmed <[email protected]> > wrote: > > >> > That's pretty amazing. > > >> > > > >> > What I am confused is, why did you go with hbase and not just > straight > > >> into > > >> > hdfs? > > >> > > > >> > > > >> > > > >> > > > >> > On Fri, Jan 25, 2013 at 2:41 AM, Jack Levin <[email protected]> > > wrote: > > >> > > > >> >> Two people including myself, its fairly hands off. Took about 3 > > months > > >> to > > >> >> tune it right, however we did have had multiple years of experience > > with > > >> >> datanodes and hadoop in general, so that was a good boost. > > >> >> > > >> >> We have 4 hbase clusters today, image store being largest > > >> >> On Jan 24, 2013 2:14 PM, "S Ahmed" <[email protected]> wrote: > > >> >> > > >> >> > Jack, out of curiosity, how many people manage the hbase related > > >> servers? > > >> >> > > > >> >> > Does it require constant monitoring or its fairly hands-off now? > > (or > > >> a > > >> >> bit > > >> >> > of both, early days was getting things write/learning and now its > > >> purring > > >> >> > along). > > >> >> > > > >> >> > > > >> >> > On Wed, Jan 23, 2013 at 11:53 PM, Jack Levin <[email protected]> > > >> wrote: > > >> >> > > > >> >> > > Its best to keep some RAM for caching of the filesystem, > besides > > we > > >> >> > > also run datanode which takes heap as well. > > >> >> > > Now, please keep in mind that even if you specify heap of say > > 5GB, > > >> if > > >> >> > > your server opens threads to communicate with other systems via > > RPC > > >> >> > > (which hbase does a lot), you will indeed use HEAP + > > >> >> > > Nthreads*thread*kb_size. There is a good Sun Microsystems > > document > > >> >> > > about it. (I don't have the link handy). > > >> >> > > > > >> >> > > -Jack > > >> >> > > > > >> >> > > > > >> >> > > > > >> >> > > On Mon, Jan 21, 2013 at 5:10 PM, Varun Sharma < > > [email protected]> > > >> >> > wrote: > > >> >> > > > Thanks for the useful information. I wonder why you use only > 5G > > >> heap > > >> >> > when > > >> >> > > > you have an 8G machine ? Is there a reason to not use all of > it > > >> (the > > >> >> > > > DataNode typically takes a 1G of RAM) > > >> >> > > > > > >> >> > > > On Sun, Jan 20, 2013 at 11:49 AM, Jack Levin < > > [email protected]> > > >> >> > wrote: > > >> >> > > > > > >> >> > > >> I forgot to mention that I also have this setup: > > >> >> > > >> > > >> >> > > >> <property> > > >> >> > > >> <name>hbase.hregion.memstore.flush.size</name> > > >> >> > > >> <value>33554432</value> > > >> >> > > >> <description>Flush more often. Default: > > 67108864</description> > > >> >> > > >> </property> > > >> >> > > >> > > >> >> > > >> This parameter works on per region amount, so this means if > > any > > >> of > > >> >> my > > >> >> > > >> 400 (currently) regions on a regionserver has 30MB+ in > > memstore, > > >> the > > >> >> > > >> hbase will flush it to disk. > > >> >> > > >> > > >> >> > > >> > > >> >> > > >> Here are some metrics from a regionserver: > > >> >> > > >> > > >> >> > > >> requests=2, regions=370, stores=370, storefiles=1390, > > >> >> > > >> storefileIndexSize=304, memstoreSize=2233, > > compactionQueueSize=0, > > >> >> > > >> flushQueueSize=0, usedHeap=3516, maxHeap=4987, > > >> >> > > >> blockCacheSize=790656256, blockCacheFree=255245888, > > >> >> > > >> blockCacheCount=2436, blockCacheHitCount=218015828, > > >> >> > > >> blockCacheMissCount=13514652, > blockCacheEvictedCount=2561516, > > >> >> > > >> blockCacheHitRatio=94, blockCacheHitCachingRatio=98 > > >> >> > > >> > > >> >> > > >> Note, that memstore is only 2G, this particular regionserver > > >> HEAP is > > >> >> > set > > >> >> > > >> to 5G. > > >> >> > > >> > > >> >> > > >> And last but not least, its very important to have good GC > > setup: > > >> >> > > >> > > >> >> > > >> export HBASE_OPTS="$HBASE_OPTS -verbose:gc -Xms5000m > > >> >> > > >> -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails > > >> >> > > >> -XX:+PrintGCDateStamps > > >> >> > > >> -XX:+HeapDumpOnOutOfMemoryError > > >> >> -Xloggc:$HBASE_HOME/logs/gc-hbase.log > > >> >> > \ > > >> >> > > >> -XX:MaxTenuringThreshold=15 -XX:SurvivorRatio=8 \ > > >> >> > > >> -XX:+UseParNewGC \ > > >> >> > > >> -XX:NewSize=128m -XX:MaxNewSize=128m \ > > >> >> > > >> -XX:-UseAdaptiveSizePolicy \ > > >> >> > > >> -XX:+CMSParallelRemarkEnabled \ > > >> >> > > >> -XX:-TraceClassUnloading > > >> >> > > >> " > > >> >> > > >> > > >> >> > > >> -Jack > > >> >> > > >> > > >> >> > > >> On Thu, Jan 17, 2013 at 3:29 PM, Varun Sharma < > > >> [email protected]> > > >> >> > > wrote: > > >> >> > > >> > Hey Jack, > > >> >> > > >> > > > >> >> > > >> > Thanks for the useful information. By flush size being 15 > > %, do > > >> >> you > > >> >> > > mean > > >> >> > > >> > the memstore flush size ? 15 % would mean close to 1G, > have > > you > > >> >> seen > > >> >> > > any > > >> >> > > >> > issues with flushes taking too long ? > > >> >> > > >> > > > >> >> > > >> > Thanks > > >> >> > > >> > Varun > > >> >> > > >> > > > >> >> > > >> > On Sun, Jan 13, 2013 at 8:17 AM, Jack Levin < > > [email protected] > > >> > > > >> >> > > wrote: > > >> >> > > >> > > > >> >> > > >> >> That's right, Memstore size , not flush size is > increased. > > >> >> > Filesize > > >> >> > > is > > >> >> > > >> >> 10G. Overall write cache is 60% of heap and read cache is > > 20%. > > >> >> > Flush > > >> >> > > >> size > > >> >> > > >> >> is 15%. 64 maxlogs at 128MB. One namenode server, one > > >> secondary > > >> >> > that > > >> >> > > >> can > > >> >> > > >> >> be promoted. On the way to hbase images are written to a > > >> queue, > > >> >> so > > >> >> > > >> that we > > >> >> > > >> >> can take Hbase down for maintenance and still do inserts > > >> later. > > >> >> > > >> ImageShack > > >> >> > > >> >> has ‘perma cache’ servers that allows writes and serving > of > > >> data > > >> >> > even > > >> >> > > >> when > > >> >> > > >> >> hbase is down for hours, consider it 4th replica 😉 > > outside of > > >> >> > hadoop > > >> >> > > >> >> > > >> >> > > >> >> Jack > > >> >> > > >> >> > > >> >> > > >> >> *From:* Mohit Anchlia <[email protected]> > > >> >> > > >> >> *Sent:* January 13, 2013 7:48 AM > > >> >> > > >> >> *To:* [email protected] > > >> >> > > >> >> *Subject:* Re: Storing images in Hbase > > >> >> > > >> >> > > >> >> > > >> >> Thanks Jack for sharing this information. This definitely > > >> makes > > >> >> > sense > > >> >> > > >> when > > >> >> > > >> >> using the type of caching layer. You mentioned about > > >> increasing > > >> >> > write > > >> >> > > >> >> cache, I am assuming you had to increase the following > > >> parameters > > >> >> > in > > >> >> > > >> >> addition to increase the memstore size: > > >> >> > > >> >> > > >> >> > > >> >> hbase.hregion.max.filesize > > >> >> > > >> >> hbase.hregion.memstore.flush.size > > >> >> > > >> >> > > >> >> > > >> >> On Fri, Jan 11, 2013 at 9:47 AM, Jack Levin < > > >> [email protected]> > > >> >> > > wrote: > > >> >> > > >> >> > > >> >> > > >> >> > We buffer all accesses to HBASE with Varnish SSD based > > >> caching > > >> >> > > layer. > > >> >> > > >> >> > So the impact for reads is negligible. We have 70 node > > >> >> cluster, > > >> >> > 8 > > >> >> > > GB > > >> >> > > >> >> > of RAM per node, relatively weak nodes (intel core 2 > > duo), > > >> with > > >> >> > > >> >> > 10-12TB per server of disks. Inserting 600,000 images > > per > > >> day. > > >> >> > We > > >> >> > > >> >> > have relatively little of compaction activity as we > made > > our > > >> >> > write > > >> >> > > >> >> > cache much larger than read cache - so we don't > > experience > > >> >> region > > >> >> > > file > > >> >> > > >> >> > fragmentation as much. > > >> >> > > >> >> > > > >> >> > > >> >> > -Jack > > >> >> > > >> >> > > > >> >> > > >> >> > On Fri, Jan 11, 2013 at 9:40 AM, Mohit Anchlia < > > >> >> > > >> [email protected]> > > >> >> > > >> >> > wrote: > > >> >> > > >> >> > > I think it really depends on volume of the traffic, > > data > > >> >> > > >> distribution > > >> >> > > >> >> per > > >> >> > > >> >> > > region, how and when files compaction occurs, number > of > > >> nodes > > >> >> > in > > >> >> > > the > > >> >> > > >> >> > > cluster. In my experience when it comes to blob data > > where > > >> >> you > > >> >> > > are > > >> >> > > >> >> > serving > > >> >> > > >> >> > > 10s of thousand+ requests/sec writes and reads then > > it's > > >> very > > >> >> > > >> difficult > > >> >> > > >> >> > to > > >> >> > > >> >> > > manage HBase without very hard operations and > > maintenance > > >> in > > >> >> > > play. > > >> >> > > >> Jack > > >> >> > > >> >> > > earlier mentioned they have 1 billion images, It > would > > be > > >> >> > > >> interesting > > >> >> > > >> >> to > > >> >> > > >> >> > > know what they see in terms of compaction, no of > > requests > > >> per > > >> >> > > sec. > > >> >> > > >> I'd > > >> >> > > >> >> be > > >> >> > > >> >> > > surprised that in high volume site it can be done > > without > > >> any > > >> >> > > >> Caching > > >> >> > > >> >> > layer > > >> >> > > >> >> > > on the top to alleviate IO spikes that occurs because > > of > > >> GC > > >> >> and > > >> >> > > >> >> > compactions. > > >> >> > > >> >> > > > > >> >> > > >> >> > > On Fri, Jan 11, 2013 at 7:27 AM, Mohammad Tariq < > > >> >> > > [email protected] > > >> >> > > >> > > > >> >> > > >> >> > wrote: > > >> >> > > >> >> > > > > >> >> > > >> >> > >> IMHO, if the image files are not too huge, Hbase can > > >> >> > efficiently > > >> >> > > >> serve > > >> >> > > >> >> > the > > >> >> > > >> >> > >> purpose. You can store some additional info along > with > > >> the > > >> >> > file > > >> >> > > >> >> > depending > > >> >> > > >> >> > >> upon your search criteria to make the search faster. > > Say > > >> if > > >> >> > you > > >> >> > > >> want > > >> >> > > >> >> to > > >> >> > > >> >> > >> fetch images by the type, you can store images in > one > > >> column > > >> >> > and > > >> >> > > >> its > > >> >> > > >> >> > >> extension in another column(jpg, tiff etc). > > >> >> > > >> >> > >> > > >> >> > > >> >> > >> BTW, what exactly is the problem which you are > facing. > > >> You > > >> >> > have > > >> >> > > >> >> written > > >> >> > > >> >> > >> "But I still cant do it"? > > >> >> > > >> >> > >> > > >> >> > > >> >> > >> Warm Regards, > > >> >> > > >> >> > >> Tariq > > >> >> > > >> >> > >> https://mtariq.jux.com/ > > >> >> > > >> >> > >> > > >> >> > > >> >> > >> > > >> >> > > >> >> > >> On Fri, Jan 11, 2013 at 8:30 PM, Michael Segel < > > >> >> > > >> >> > [email protected] > > >> >> > > >> >> > >> >wrote: > > >> >> > > >> >> > >> > > >> >> > > >> >> > >> > That's a viable option. > > >> >> > > >> >> > >> > HDFS reads are faster than HBase, but it would > > require > > >> >> first > > >> >> > > >> hitting > > >> >> > > >> >> > the > > >> >> > > >> >> > >> > index in HBase which points to the file and then > > >> fetching > > >> >> > the > > >> >> > > >> file. > > >> >> > > >> >> > >> > It could be faster... we found storing binary data > > in a > > >> >> > > sequence > > >> >> > > >> >> file > > >> >> > > >> >> > and > > >> >> > > >> >> > >> > indexed on HBase to be faster than HBase, however, > > YMMV > > >> >> and > > >> >> > > HBase > > >> >> > > >> >> has > > >> >> > > >> >> > >> been > > >> >> > > >> >> > >> > improved since we did that project.... > > >> >> > > >> >> > >> > > > >> >> > > >> >> > >> > > > >> >> > > >> >> > >> > On Jan 10, 2013, at 10:56 PM, shashwat shriparv < > > >> >> > > >> >> > >> [email protected]> > > >> >> > > >> >> > >> > wrote: > > >> >> > > >> >> > >> > > > >> >> > > >> >> > >> > > Hi Kavish, > > >> >> > > >> >> > >> > > > > >> >> > > >> >> > >> > > i have a better idea for you copy your image > files > > >> to a > > >> >> > > single > > >> >> > > >> >> file > > >> >> > > >> >> > on > > >> >> > > >> >> > >> > > hdfs, and if new image comes append it to the > > >> existing > > >> >> > > image, > > >> >> > > >> and > > >> >> > > >> >> > keep > > >> >> > > >> >> > >> > and > > >> >> > > >> >> > >> > > update the metadata and the offset to the HBase. > > >> Because > > >> >> > if > > >> >> > > you > > >> >> > > >> >> put > > >> >> > > >> >> > >> > bigger > > >> >> > > >> >> > >> > > image in hbase it wil lead to some issue. > > >> >> > > >> >> > >> > > > > >> >> > > >> >> > >> > > > > >> >> > > >> >> > >> > > > > >> >> > > >> >> > >> > > ∞ > > >> >> > > >> >> > >> > > Shashwat Shriparv > > >> >> > > >> >> > >> > > > > >> >> > > >> >> > >> > > > > >> >> > > >> >> > >> > > > > >> >> > > >> >> > >> > > On Fri, Jan 11, 2013 at 9:21 AM, lars hofhansl < > > >> >> > > >> [email protected]> > > >> >> > > >> >> > >> wrote: > > >> >> > > >> >> > >> > > > > >> >> > > >> >> > >> > >> Interesting. That's close to a PB if my math is > > >> >> correct. > > >> >> > > >> >> > >> > >> Is there a write up about this somewhere? > > Something > > >> >> that > > >> >> > we > > >> >> > > >> could > > >> >> > > >> >> > link > > >> >> > > >> >> > >> > >> from the HBase homepage? > > >> >> > > >> >> > >> > >> > > >> >> > > >> >> > >> > >> -- Lars > > >> >> > > >> >> > >> > >> > > >> >> > > >> >> > >> > >> > > >> >> > > >> >> > >> > >> ----- Original Message ----- > > >> >> > > >> >> > >> > >> From: Jack Levin <[email protected]> > > >> >> > > >> >> > >> > >> To: [email protected] > > >> >> > > >> >> > >> > >> Cc: Andrew Purtell <[email protected]> > > >> >> > > >> >> > >> > >> Sent: Thursday, January 10, 2013 9:24 AM > > >> >> > > >> >> > >> > >> Subject: Re: Storing images in Hbase > > >> >> > > >> >> > >> > >> > > >> >> > > >> >> > >> > >> We stored about 1 billion images into hbase > with > > >> file > > >> >> > size > > >> >> > > up > > >> >> > > >> to > > >> >> > > >> >> > 10MB. > > >> >> > > >> >> > >> > >> Its been running for close to 2 years without > > issues > > >> >> and > > >> >> > > >> serves > > >> >> > > >> >> > >> > >> delivery of images for Yfrog and ImageShack. > If > > you > > >> >> have > > >> >> > > any > > >> >> > > >> >> > >> > >> questions about the setup, I would be glad to > > answer > > >> >> > them. > > >> >> > > >> >> > >> > >> > > >> >> > > >> >> > >> > >> -Jack > > >> >> > > >> >> > >> > >> > > >> >> > > >> >> > >> > >> On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia < > > >> >> > > >> >> > [email protected] > > >> >> > > >> >> > >> > > > >> >> > > >> >> > >> > >> wrote: > > >> >> > > >> >> > >> > >>> I have done extensive testing and have found > > that > > >> >> blobs > > >> >> > > don't > > >> >> > > >> >> > belong > > >> >> > > >> >> > >> in > > >> >> > > >> >> > >> > >> the > > >> >> > > >> >> > >> > >>> databases but are rather best left out on the > > file > > >> >> > system. > > >> >> > > >> >> Andrew > > >> >> > > >> >> > >> > >> outlined > > >> >> > > >> >> > >> > >>> issues that you'll face and not to mention IO > > >> issues > > >> >> > when > > >> >> > > >> >> > compaction > > >> >> > > >> >> > >> > >> occurs > > >> >> > > >> >> > >> > >>> over large files. > > >> >> > > >> >> > >> > >>> > > >> >> > > >> >> > >> > >>> On Sun, Jan 6, 2013 at 12:52 PM, Andrew > Purtell > > < > > >> >> > > >> >> > [email protected] > > >> >> > > >> >> > >> > > > >> >> > > >> >> > >> > >> wrote: > > >> >> > > >> >> > >> > >>> > > >> >> > > >> >> > >> > >>>> I meant this to say "a few really large > values" > > >> >> > > >> >> > >> > >>>> > > >> >> > > >> >> > >> > >>>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew > > Purtell < > > >> >> > > >> >> > >> [email protected]> > > >> >> > > >> >> > >> > >>>> wrote: > > >> >> > > >> >> > >> > >>>> > > >> >> > > >> >> > >> > >>>>> Consider if the split threshold is 2 GB but > > your > > >> one > > >> >> > row > > >> >> > > >> >> > contains > > >> >> > > >> >> > >> 10 > > >> >> > > >> >> > >> > >> GB > > >> >> > > >> >> > >> > >>>> as > > >> >> > > >> >> > >> > >>>>> really large value. > > >> >> > > >> >> > >> > >>>> > > >> >> > > >> >> > >> > >>>> > > >> >> > > >> >> > >> > >>>> > > >> >> > > >> >> > >> > >>>> > > >> >> > > >> >> > >> > >>>> -- > > >> >> > > >> >> > >> > >>>> Best regards, > > >> >> > > >> >> > >> > >>>> > > >> >> > > >> >> > >> > >>>> - Andy > > >> >> > > >> >> > >> > >>>> > > >> >> > > >> >> > >> > >>>> Problems worthy of attack prove their worth > by > > >> >> hitting > > >> >> > > >> back. - > > >> >> > > >> >> > Piet > > >> >> > > >> >> > >> > Hein > > >> >> > > >> >> > >> > >>>> (via Tom White) > > >> >> > > >> >> > >> > >>>> > > >> >> > > >> >> > >> > >> > > >> >> > > >> >> > >> > >> > > >> >> > > >> >> > >> > > > >> >> > > >> >> > >> > > > >> >> > > >> >> > >> > > >> >> > > >> >> > > > >> >> > > >> >> > > >> >> > > >> > > >> >> > > > > >> >> > > > >> >> > > >> > > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) >
