The answer is HDFS-200 and changes to HLog. You should start considering what your 0.90 upgrade plan will be, it is imperative that within 3 months no one is running 0.20.6 or earlier. Getting the features of 0.90 on 0.20.x is not the right direction and would take as much effort as creating 0.90 essentially.
To help the adoption we are using 0.89 at Stumbleupon in production and will be one of the first users of 0.90 as it comes out. -ryan On Thu, Oct 14, 2010 at 7:05 PM, Ted Yu <[email protected]> wrote: > J-D: > If you can briefly point out the code in 0.89 which makes using WAL more > reliable, that would be great. > > Thanks > > On Thu, Oct 14, 2010 at 5:51 PM, Jean-Daniel Cryans > <[email protected]>wrote: > >> Even if HBaseAdmin.flush is made synchronous, that won't get you far >> since it's still processed sequentially on the region servers. A >> better well-known option is to set hbase.regionserver.hlog.blocksize >> to a small number, and if you want high durability you could set that >> to 1KB (basically rolling at every new insert). Since this is >> incredibly inefficient, a more wide-spread number (and one we used >> while we were on 0.20) is 2MB. Set it higher if you have a high insert >> rate, or lower if you don't insert very often. >> >> J-D >> >> On Thu, Oct 14, 2010 at 8:36 PM, Ted Yu <[email protected]> wrote: >> > We're still using 0.20.6 :-) >> > >> > On Thu, Oct 14, 2010 at 5:19 PM, Jean-Daniel Cryans <[email protected] >> >wrote: >> > >> >> If your Puts are using the WAL, and you are on 0.89, it's already as >> >> durable as it can be without forcing flushes. >> >> >> >> J-D >> >> >> >> On Thu, Oct 14, 2010 at 8:07 PM, Ted Yu <[email protected]> wrote: >> >> > Hi, >> >> > HBaseAdmin.flush() is asynchronous. >> >> > In order to achieve high durability, do I have a better choice ? >> >> > >> >> > Thanks >> >> > >> >> >> > >> >
