I would guess clock skew, all the machines have approx the same time? A few seconds is acceptable, but not more.
J-D On Thu, Jul 22, 2010 at 4:34 PM, Vladimir Rodionov <vrodio...@carrieriq.com> wrote: > Have anybody encountered this particular bug before? > We have been having this intermittently in our QA small cluster. > > We run a flow which is basically custom ETL process over data stored in > hdfs. Yes it is a bunch of M/R jobs. > One of the jobs stores data into HBase (0.20.3), the next one loads data from > HBase (using scan) performs additional transformations > and stores data finally into RDBMS. > > Flow works fine (most of the time). It means that new HBase tables are > created, data is loaded and can be read after that during the next M/R job > > After flow finishes , data from tables (but not tables itself), sometimes, > mysteriously disappear. This is not deterministic and to get data back we > need to RESTART HBase cluster. > So HBase restart fixes the problem. > > Cluster is small (3 servers). RAM is limited - 8GB. Only 2 CPU cores per > server but input data size is small as well and the average size of > disappearing tables is several 1000s rows- > they are small. Hadoop is from CHD2. I can not get you any additional helpful > information at the time (no log files), but may be somebody has encountered > this > before and has idea how to fix it. > > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: vrodio...@carrieriq.com >