Re: Data disappears and re-appears again after HBase cluster restart

Stack Tue, 27 Jul 2010 20:41:45 -0700

Are your tables big?

if not, can you tar them up and tell us what to doesn't work and we'll
try it over here w/ 0.20.3 and 0.20.5?  A pretty radical change in how
we do gets and scans went in between 0.20.3 and 0.20.5 but it should
be improving things rather than making things worse -- least its been
good for the rest of us.


St.Ack


On Tue, Jul 27, 2010 at 6:38 PM, Vladimir Rodionov
<vrodio...@carrieriq.com> wrote:
> I can not give you any data points right now except one: this happens only in 
> 0.20.5.
> I have run the tests against 0.20.3 and everything was OK.
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodio...@carrieriq.com
>
> ________________________________________
> From: saint....@gmail.com [saint....@gmail.com] On Behalf Of Stack 
> [st...@duboce.net]
> Sent: Tuesday, July 27, 2010 10:41 AM
> To: dev@hbase.apache.org
> Subject: Re: Data disappears and re-appears again after HBase cluster restart
>
> On Tue, Jul 27, 2010 at 10:26 AM, Vladimir Rodionov
> <vrodio...@carrieriq.com> wrote:
>> Yes, we set timestamps on all Puts. The vast majority of timestamps are in 
>> the past (several minutes from now()) and only small fraction is in the 
>> future (and this future will never come - Its pretty close to Long.MAX_VALUE)
>
> When you run a scan, do you set the starttime to include these Puts
> that are in the future?
>
>> But we have now clocks synced on all servers so I do not think this can 
>> explain the issue. Besides this, we do not set timestamps when we do inserts 
>> into  one particular table and this table disappears as well (and reappears 
>> after restart)
>>
>
> This I cannot explain.  I don't see this phenomeon at all.  Restart
> should have no effect on the data being carried by the cluster.  Can
> you dig around some more and get us some more data points?
>
> St.Ack
>
>> Best regards,
>> Vladimir Rodionov
>> Principal Platform Engineer
>> Carrier IQ, www.carrieriq.com
>> e-mail: vrodio...@carrieriq.com
>>
>> ________________________________________
>> From: saint....@gmail.com [saint....@gmail.com] On Behalf Of Stack 
>> [st...@duboce.net]
>> Sent: Monday, July 26, 2010 11:05 PM
>> To: dev@hbase.apache.org
>> Subject: Re: Data disappears and re-appears again after HBase cluster restart
>>
>> Vladimir:
>>
>> Are you setting times on cells you add to HBase?  If so, could these
>> be in the future as far as the regionserver is concerned.  For
>> example, perhaps you are setting the version/timestamp on a client
>> whose close is different from that over on the RegionServer, then when
>> we scan, we miss these future values?
>>
>> Do you have to restart the cluster?  What happens if you just wait?
>> Does the data come back then?
>>
>> St.Ack
>>
>>
>> On Mon, Jul 26, 2010 at 6:14 PM, Vladimir Rodionov
>> <vrodio...@carrieriq.com> wrote:
>>> We are running ntpd on all servers and clocks are in sync now but it has 
>>> not fixed the problem.
>>> I run the flow, then check
>>>
>>> hbase shell
>>>> count 'tableX'
>>> 0 rows
>>>
>>> after HBase restart I am able to get the 'right' number of rows in a table
>>>
>>> For some tables I get wrong number of rows that is always less than the 
>>> actual number of rows, for others I get - 0 rows.
>>> It always goes away after HBase restart. All tables are small in size and 
>>> all are newly created during our flow execution.
>>>
>>> I have checked many times Master and Region server's log files but apart 
>>> from:
>>>
>>> RegionNotServingException -META- (or -ROOT-) I can see nothing suspicious.
>>>
>>> In Region servers log files I see a lot of messages like this one:
>>> 2010-07-26 17:05:43,751 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
>>> Finished memstore flush of ~114.4k for region 
>>> 10__HB_NOINC_ORCL_JDBC_0726_MEJOMEJO-ERROR_COUNTS-1280187791424-0,,1280187802112
>>>  in 985ms, sequence id=309833, compaction requested=false
>>>
>>> This is during the cluster's shutdown operation.
>>>
>>> Best regards,
>>> Vladimir Rodionov
>>> Principal Platform Engineer
>>> Carrier IQ, www.carrieriq.com
>>> e-mail: vrodio...@carrieriq.com
>>>
>>> ________________________________________
>>> From: jdcry...@gmail.com [jdcry...@gmail.com] On Behalf Of Jean-Daniel 
>>> Cryans [jdcry...@apache.org]
>>> Sent: Thursday, July 22, 2010 5:43 PM
>>> To: dev@hbase.apache.org
>>> Subject: Re: Data disappears and re-appears again after HBase cluster 
>>> restart
>>>
>>> Data doesn't disappear, it's probably just hidden behind a delete or
>>> something like that (the user mailing list contains reports of events
>>> like that that were fixed by running NTP on all machines, as required
>>> by the Getting Started guide
>>> http://hbase.apache.org/docs/r0.20.5/api/overview-summary.html#requirements).
>>>
>>> This article explains gives good info about timestamps in HBase
>>> http://outerthought.org/blog/417-ot.html
>>>
>>> J-D
>>>
>>> On Thu, Jul 22, 2010 at 5:29 PM, Vladimir Rodionov
>>> <vrodio...@carrieriq.com> wrote:
>>>> Yes, I just checked all 3 servers and their clocks are not synchronized 
>>>> (up to 2 min diff)
>>>> Can you please elaborate a little bit more:  how can this result in data 
>>>> disappearance?
>>>>
>>>> Best regards,
>>>> Vladimir Rodionov
>>>> Principal Platform Engineer
>>>> Carrier IQ, www.carrieriq.com
>>>> e-mail: vrodio...@carrieriq.com
>>>>
>>>> ________________________________________
>>>> From: jdcry...@gmail.com [jdcry...@gmail.com] On Behalf Of Jean-Daniel 
>>>> Cryans [jdcry...@apache.org]
>>>> Sent: Thursday, July 22, 2010 4:38 PM
>>>> To: dev@hbase.apache.org
>>>> Subject: Re: Data disappears and re-appears again after HBase cluster 
>>>> restart
>>>>
>>>> I would guess clock skew, all the machines have approx the same time?
>>>> A few seconds is acceptable, but not more.
>>>>
>>>> J-D
>>>>
>>>> On Thu, Jul 22, 2010 at 4:34 PM, Vladimir Rodionov
>>>> <vrodio...@carrieriq.com> wrote:
>>>>> Have anybody encountered this particular bug before?
>>>>> We have been having this intermittently in our QA small cluster.
>>>>>
>>>>> We run a flow  which is basically custom ETL process over data stored in 
>>>>> hdfs. Yes it is a bunch of M/R jobs.
>>>>> One of the jobs stores data into HBase (0.20.3), the next one loads data 
>>>>> from HBase (using scan) performs additional transformations
>>>>> and stores data finally into RDBMS.
>>>>>
>>>>> Flow works fine (most of the time). It means that new HBase tables are 
>>>>> created, data is loaded and can be read after that during the next M/R job
>>>>>
>>>>> After flow finishes , data from tables (but not tables itself), 
>>>>> sometimes, mysteriously disappear. This is not deterministic and to get 
>>>>> data back we need to RESTART HBase cluster.
>>>>> So HBase restart fixes the problem.
>>>>>
>>>>> Cluster is small (3 servers). RAM is limited - 8GB. Only 2 CPU cores per 
>>>>> server but input data size is small as well and the average size of 
>>>>> disappearing tables is several 1000s rows-
>>>>> they are small. Hadoop is from CHD2. I can not get you any additional 
>>>>> helpful information at the time (no log files), but may be somebody has 
>>>>> encountered this
>>>>> before and has idea how to fix it.
>>>>>
>>>>>
>>>>> Best regards,
>>>>> Vladimir Rodionov
>>>>> Principal Platform Engineer
>>>>> Carrier IQ, www.carrieriq.com
>>>>> e-mail: vrodio...@carrieriq.com
>>>>>
>>>>
>>>
>>
>

Re: Data disappears and re-appears again after HBase cluster restart

Reply via email to