Are your tables big? if not, can you tar them up and tell us what to doesn't work and we'll try it over here w/ 0.20.3 and 0.20.5? A pretty radical change in how we do gets and scans went in between 0.20.3 and 0.20.5 but it should be improving things rather than making things worse -- least its been good for the rest of us.
St.Ack On Tue, Jul 27, 2010 at 6:38 PM, Vladimir Rodionov <vrodio...@carrieriq.com> wrote: > I can not give you any data points right now except one: this happens only in > 0.20.5. > I have run the tests against 0.20.3 and everything was OK. > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: vrodio...@carrieriq.com > > ________________________________________ > From: saint....@gmail.com [saint....@gmail.com] On Behalf Of Stack > [st...@duboce.net] > Sent: Tuesday, July 27, 2010 10:41 AM > To: dev@hbase.apache.org > Subject: Re: Data disappears and re-appears again after HBase cluster restart > > On Tue, Jul 27, 2010 at 10:26 AM, Vladimir Rodionov > <vrodio...@carrieriq.com> wrote: >> Yes, we set timestamps on all Puts. The vast majority of timestamps are in >> the past (several minutes from now()) and only small fraction is in the >> future (and this future will never come - Its pretty close to Long.MAX_VALUE) > > When you run a scan, do you set the starttime to include these Puts > that are in the future? > >> But we have now clocks synced on all servers so I do not think this can >> explain the issue. Besides this, we do not set timestamps when we do inserts >> into one particular table and this table disappears as well (and reappears >> after restart) >> > > This I cannot explain. I don't see this phenomeon at all. Restart > should have no effect on the data being carried by the cluster. Can > you dig around some more and get us some more data points? > > St.Ack > >> Best regards, >> Vladimir Rodionov >> Principal Platform Engineer >> Carrier IQ, www.carrieriq.com >> e-mail: vrodio...@carrieriq.com >> >> ________________________________________ >> From: saint....@gmail.com [saint....@gmail.com] On Behalf Of Stack >> [st...@duboce.net] >> Sent: Monday, July 26, 2010 11:05 PM >> To: dev@hbase.apache.org >> Subject: Re: Data disappears and re-appears again after HBase cluster restart >> >> Vladimir: >> >> Are you setting times on cells you add to HBase? If so, could these >> be in the future as far as the regionserver is concerned. For >> example, perhaps you are setting the version/timestamp on a client >> whose close is different from that over on the RegionServer, then when >> we scan, we miss these future values? >> >> Do you have to restart the cluster? What happens if you just wait? >> Does the data come back then? >> >> St.Ack >> >> >> On Mon, Jul 26, 2010 at 6:14 PM, Vladimir Rodionov >> <vrodio...@carrieriq.com> wrote: >>> We are running ntpd on all servers and clocks are in sync now but it has >>> not fixed the problem. >>> I run the flow, then check >>> >>> hbase shell >>>> count 'tableX' >>> 0 rows >>> >>> after HBase restart I am able to get the 'right' number of rows in a table >>> >>> For some tables I get wrong number of rows that is always less than the >>> actual number of rows, for others I get - 0 rows. >>> It always goes away after HBase restart. All tables are small in size and >>> all are newly created during our flow execution. >>> >>> I have checked many times Master and Region server's log files but apart >>> from: >>> >>> RegionNotServingException -META- (or -ROOT-) I can see nothing suspicious. >>> >>> In Region servers log files I see a lot of messages like this one: >>> 2010-07-26 17:05:43,751 INFO org.apache.hadoop.hbase.regionserver.HRegion: >>> Finished memstore flush of ~114.4k for region >>> 10__HB_NOINC_ORCL_JDBC_0726_MEJOMEJO-ERROR_COUNTS-1280187791424-0,,1280187802112 >>> in 985ms, sequence id=309833, compaction requested=false >>> >>> This is during the cluster's shutdown operation. >>> >>> Best regards, >>> Vladimir Rodionov >>> Principal Platform Engineer >>> Carrier IQ, www.carrieriq.com >>> e-mail: vrodio...@carrieriq.com >>> >>> ________________________________________ >>> From: jdcry...@gmail.com [jdcry...@gmail.com] On Behalf Of Jean-Daniel >>> Cryans [jdcry...@apache.org] >>> Sent: Thursday, July 22, 2010 5:43 PM >>> To: dev@hbase.apache.org >>> Subject: Re: Data disappears and re-appears again after HBase cluster >>> restart >>> >>> Data doesn't disappear, it's probably just hidden behind a delete or >>> something like that (the user mailing list contains reports of events >>> like that that were fixed by running NTP on all machines, as required >>> by the Getting Started guide >>> http://hbase.apache.org/docs/r0.20.5/api/overview-summary.html#requirements). >>> >>> This article explains gives good info about timestamps in HBase >>> http://outerthought.org/blog/417-ot.html >>> >>> J-D >>> >>> On Thu, Jul 22, 2010 at 5:29 PM, Vladimir Rodionov >>> <vrodio...@carrieriq.com> wrote: >>>> Yes, I just checked all 3 servers and their clocks are not synchronized >>>> (up to 2 min diff) >>>> Can you please elaborate a little bit more: how can this result in data >>>> disappearance? >>>> >>>> Best regards, >>>> Vladimir Rodionov >>>> Principal Platform Engineer >>>> Carrier IQ, www.carrieriq.com >>>> e-mail: vrodio...@carrieriq.com >>>> >>>> ________________________________________ >>>> From: jdcry...@gmail.com [jdcry...@gmail.com] On Behalf Of Jean-Daniel >>>> Cryans [jdcry...@apache.org] >>>> Sent: Thursday, July 22, 2010 4:38 PM >>>> To: dev@hbase.apache.org >>>> Subject: Re: Data disappears and re-appears again after HBase cluster >>>> restart >>>> >>>> I would guess clock skew, all the machines have approx the same time? >>>> A few seconds is acceptable, but not more. >>>> >>>> J-D >>>> >>>> On Thu, Jul 22, 2010 at 4:34 PM, Vladimir Rodionov >>>> <vrodio...@carrieriq.com> wrote: >>>>> Have anybody encountered this particular bug before? >>>>> We have been having this intermittently in our QA small cluster. >>>>> >>>>> We run a flow which is basically custom ETL process over data stored in >>>>> hdfs. Yes it is a bunch of M/R jobs. >>>>> One of the jobs stores data into HBase (0.20.3), the next one loads data >>>>> from HBase (using scan) performs additional transformations >>>>> and stores data finally into RDBMS. >>>>> >>>>> Flow works fine (most of the time). It means that new HBase tables are >>>>> created, data is loaded and can be read after that during the next M/R job >>>>> >>>>> After flow finishes , data from tables (but not tables itself), >>>>> sometimes, mysteriously disappear. This is not deterministic and to get >>>>> data back we need to RESTART HBase cluster. >>>>> So HBase restart fixes the problem. >>>>> >>>>> Cluster is small (3 servers). RAM is limited - 8GB. Only 2 CPU cores per >>>>> server but input data size is small as well and the average size of >>>>> disappearing tables is several 1000s rows- >>>>> they are small. Hadoop is from CHD2. I can not get you any additional >>>>> helpful information at the time (no log files), but may be somebody has >>>>> encountered this >>>>> before and has idea how to fix it. >>>>> >>>>> >>>>> Best regards, >>>>> Vladimir Rodionov >>>>> Principal Platform Engineer >>>>> Carrier IQ, www.carrieriq.com >>>>> e-mail: vrodio...@carrieriq.com >>>>> >>>> >>> >> >