So after an offline discussion and some more discussion on IRC, it was found that the problem was similar to http://issues.apache.org/jira/browse/HBASE-29 and was caused by clock skew. The fact that they set their timestamps exacerbates the problem because the different clients had wildly different dates; if it was the region server setting the ts then it would be more consistant.
The resolution for the user is to resolve the clock skew and on the HBase side we need to make the get behave more like the scan. J-D On Fri, Jan 22, 2010 at 12:11 PM, Joost Ouwerkerk <jo...@openplaces.org> wrote: > We do set an explicit timestamp, and I understand that we may be among the > few in this regard. We haven't performed any deletes on those rows. I will > try flushing and let you know... > > On Fri, Jan 22, 2010 at 1:52 PM, Stack <st...@duboce.net> wrote: > >> How were cells inserted? With explicit timestamp? Any deletes >> floating around? If you flush the region, does the behavior change? >> (See 'tools' in the shell.... do hbase> flush 'regionname'... you'll >> have to figure out the region that is hosting the row you are looking >> at). Can you bundle up the region that these cells are in and pass it >> to us somehow? >> St.Ack >> >> On Fri, Jan 22, 2010 at 7:56 AM, Joost Ouwerkerk <jo...@openplaces.org> >> wrote: >> > We're seeing some dangerously inconsistent behaviour in retrieving data >> from >> > HBase. In particular circumstances whose conditions are still unclear, >> get >> > and scan (without timestamp params) are returning different versions of a >> > column. We are running 0.20.2. See below for evidence. >> > >> > hbase(main):006:0> scan 'generated_pages',{STARTROW=>'240: >> > http://com.golflink.www/golf-courses/course.aspx?course=1008656 >> > ',LIMIT=>2,COLUMNS=>['attribute:url']} >> > ROW COLUMN+CELL >> > >> > 240:http://com.golflink.www column=attribute:url, timestamp=* >> > 5429280163307928320*, value=\001http://www.golflin >> > /golf-courses/course.aspx?c >> k.com/golf-courses/course.aspx?course=1008656 >> > >> > ourse=1008656 >> > >> > 2 row(s) in 0.0100 seconds >> > >> > hbase(main):007:0> get 'generated_pages', '240: >> > http://com.golflink.www/golf-courses/course.aspx?course=1008656', >> > COLUMN=>'attribute:url' >> > timestamp=*5429243797819101088*, value=\001 >> > http://www.golflink.com/golf-courses/course.aspx?course=1008656 >> > 1 row(s) in 0.0020 seconds >> > >> > Any ideas about how this is possible? >> > >> > joost. >> > >> >