Speculative execution is on by default. http://hbase.apache.org/book.html#mapreduce.specex
On 3/23/12 8:04 AM, "Peter Wolf" <opus...@gmail.com> wrote: >Hi Michel, > >I agree it doesn't make sense, but then I believe we are tracking a bug. > >I don't know about speculative execution, but I certainly did not switch >it on. > >I am just counting the number of rows that come back in the Result. > >If you are interested in this, try my Unit test. I'd be very interested >to see if behaves the same for others. > >http://dl.dropbox.com/u/68001072/HBaseScanCacheBug.java > > >Here is the output. It shows how the number of results and key value >pairs varies as caching in changed, and families are included. It shows >the bug starting with 3 families and 5000 caching. It also shows a new >bug, where the query always fails with an IOException with 4 families. > >CacheSize FamilyCount ResultCount KeyValueCount >1000 1 10000 10 >5000 1 10000 10 > > > >On 3/23/12 7:55 AM, Michel Segel wrote: >> Peter, that doesnt make sense. >> >> I mean I believe you in what you are saying, but don't see how a VPN in >>would cause this variance in results. >> >> Do you have any speculative execution turned on? >> >> Are you counting just the numbers of rows in the result set, or are you >>using counters in the map reduce? (I'm assuming that you are running a >>map/reduce, and not just a simple connection and single threaded >>scan...). >> >> I apologize if this had already been answered, I hadn't been following >>this too closely. >> >> Sent from a remote device. Please excuse any typos... >> >> Mike Segel >> >> On Mar 22, 2012, at 8:01 PM, Peter Wolf<opus...@gmail.com> wrote: >> >>> Hello again Lars and Lars, >>> >>> Here is some additional information that may help you track this down. >>> >>> I think this behavior has something to do with my VPN. My servers are >>>on the Amazon Cloud and I normally run my client on my laptop via a VPN >>>(Tunnelblick: OS X 10.7.3; Tunnelblick 3.2.3 (build 2891.2932)). This >>>is where I see the buggy behavior I describe. >>> >>> However, when my Client is running on an EC2 machine, then I get >>>different behavior. I can not prove that it is always correct, but in >>>at least one case my current code does not work on my laptop, but gets >>>the correct number of results on an EC2 machine. Note that my scans >>>are also much faster on the EC2 machine. >>> >>> I will do more tests to see if I can localize it further. >>> >>> Hope this helps >>> Thank you again >>> Peter >>> >>> >>> On 3/19/12 2:24 PM, Peter Wolf wrote: >>>> Hello Lars and Lars, >>>> >>>> Thank you for you help and attention. >>>> >>>> I wrote a standalone test that exhibits the bug. >>>> >>>> http://dl.dropbox.com/u/68001072/HBaseScanCacheBug.java >>>> >>>> Here is the output. It shows how the number of results and key value >>>>pairs varies as caching in changed, and families are included. It >>>>shows the bug starting with 3 families and 5000 caching. It also >>>>shows a new bug, where the query always fails with an IOException with >>>>4 families. >>>> >>>> CacheSize FamilyCount ResultCount KeyValueCount >>>> 1000 1 10000 10 >>>> 5000 1 10000 10 > >