Amit, HBASE-4485 describes the behavior I'm seeing, thanks. Looking over the patches I'm under the impression that HBASE-4485 which is a subtask of HBASE-2856 was back ported through HBASE-4838 to 0.92 by Lars. Am I wrong?
Thanks, Cosmin On 2/14/12 11:06 PM, "Amitanand Aiyer" <[email protected]> wrote: >Hi Cosmin, > https://issues.apache.org/jira/browse/HBASE-4485 might be applicable. > > The patch was included in the fix for 2856. > >Cheers, >-Amit > >________________________________________ >From: Cosmin Lehene [[email protected]] >Sent: Tuesday, February 14, 2012 12:02 PM >To: [email protected] >Subject: Re: MR job "randomly" scans up thousands of rows less than the >it should. > >I just got back on this issue. Initially the behavior we've seen (missing >rows) wouldn't reproduce on 0.90 using TestAcidGuarantees. >However, if the puts in the writer threads include additional rows the >scanners will start reading less rows. This reproduces consistently on >0.90 and seems to be working correctly on 0.92. > >HBASE-2856/HBASE-4838 are probably the solution, although there's a chance >it's some other fix on 0.92 (ideas?) > >We're undecided whether backporting to 0.90 vs upgrading the affected >clusters to 0.92 would be better? >Also is there interest for this fix on 0.90? > >Thanks, >Cosmin > >On 2/6/12 6:25 PM, "Cosmin Lehene" <[email protected]> wrote: > >>Thanks Ted! >> >>I wonder if it would make more sense to port it to 0.90.X or upgrade to >>0.92. >> >>Cosmin >> >>On 2/2/12 5:03 PM, "Ted Yu" <[email protected]> wrote: >> >>>HBASE-4838 ports HBASE-2856 to 0.92 >>> >>>FYI >>> >>>On Thu, Feb 2, 2012 at 4:46 PM, Cosmin Lehene <[email protected]> wrote: >>> >>>> (sorry for the damaged subject :)) >>>> >>>> >>>> Hey Jon, >>>> We have two column families. >>>> There are no filters and there's a full table scan. We're not skipping >>>> rows. >>>> I did see however a single time that we had one qualifier "fault" in >>>>the >>>> job counters (it was missing, and it wasn't supposed to be missing). >>>> However that was only once and it doesn't happen when we encounter >>>>missing >>>> rows. >>>> >>>> We're getting this behavior consistently although I couldn't figure a >>>>way >>>> to reproduce it. I'll try running multiple instances of the job in >>>> parallel to figure out if that would affect the outcome. >>>> I'll probably have to add more debugging for the affected rows and dig >>>> deeper. >>>> >>>> HBASE-2856 is a pretty large issue - do you think it could be related >>>>to >>>> what I'm seeing? If so it could help me reproduce it. >>>> >>>> Thanks, >>>> Cosmin >>>> >>>> >>>> >>>> >>>> On 2/1/12 11:30 PM, "Jonathan Hsieh" <[email protected]> wrote: >>>> >>>> >Cosmin, >>>> > >>>> >How many column families to you have in this table? Are you using >>>>any >>>> >filters in you HBase scans? Are you using skip rows that may not >>>>have >>>> >qualifiers present? >>>> > >>>> >There are a few known issues with multi-CF atomicity and a recent one >>>> >about >>>> >flushes that may be related to this problem. There HBASE-2856, a fix >>>> >having to do with flushes which is pretty intricate and only in 0.92. >>>> > >>>> >Jon. >>>> > >>>> >On Wed, Feb 1, 2012 at 8:46 PM, Cosmin Lehene <[email protected]> >>>>wrote: >>>> > >>>> >> We have a MR job that runs every few minutes on some time series >>>>data >>>> >> which is continuously updated (never deleted). >>>> >> Every few (in the range of tens to hundreds) runs the map task that >>>> >>covers >>>> >> the last region will get fewer input records (off by 500-5000 rows) >>>> >>without >>>> >> any splits happening. This lower number of input records could >>>>persist >>>> >>for >>>> >> a few MR runs, but will eventually get back to the "correct" value. >>>> >> >>>> >> This drop can be seen both in the "map input records" metric but >>>>it's >>>> >> correlated with the metrics that get computed by the MR job (so >>>>it's >>>> >>not a >>>> >> MR counter bug). >>>> >> >>>> >> There are no exceptions in the MR job, or in the region server and >>>>this >>>> >> doesn't seem to be correlated with any compaction, split or region >>>> >>movement. >>>> >> The only "variable" in this scenario is that new data gets injected >>>> >> continuously (and the actual MR job which is idempotent) >>>> >> >>>> >> This entire puzzle takes place on HBase 0.90.5 ish (12 dec 2011) >>>>on >>>> >>top >>>> >> of Hadoop cdh3u2. >>>> >> >>>> >> Cosmin >>>> >> >>>> >> >>>> >> >>>> >> >>>> > >>>> > >>>> >-- >>>> >// Jonathan Hsieh (shay) >>>> >// Software Engineer, Cloudera >>>> >// [email protected] >>>> >
