I was hoping to rule out changes in IPC handlers and other upper layers and narrow it down to the difference between HFileV1 and HFileV2, but it sounds like you have a lot of moving pieces.
On Thu, Dec 15, 2011 at 12:24 PM, Jean-Daniel Cryans <[email protected]>wrote: > Trying this now. > > J-D > > On Thu, Dec 15, 2011 at 11:35 AM, Lars <[email protected]> wrote: > > Do you see the same slowdown with the default 64k block size? > > > > Lars <[email protected]> schrieb: > > > >>I'll be busy today... I'll double check my scanning related changes as > soon as i can. > >> > >>Jean-Daniel Cryans <[email protected]> schrieb: > >> > >>>Yes and yes. > >>> > >>>J-D > >>>On Dec 14, 2011 5:52 PM, "Matt Corgan" <[email protected]> wrote: > >>> > >>>> Regions are major compacted and have empty memstores, so no merging of > >>>> stores when reading? > >>>> > >>>> > >>>> 2011/12/14 Jean-Daniel Cryans <[email protected]> > >>>> > >>>> > Yes sorry 1.1M > >>>> > > >>>> > This is PE, the table is set to a block size of 4KB and block > caching > >>>> > is disabled. Nothing else special in there. > >>>> > > >>>> > J-D > >>>> > > >>>> > 2011/12/14 <[email protected]>: > >>>> > > Thanks for the info, J-D. > >>>> > > > >>>> > > I guess the 1.1 below is in millions. > >>>> > > > >>>> > > Can you tell us more about your tables - bloom filters, etc ? > >>>> > > > >>>> > > > >>>> > > > >>>> > > 在 Dec 14, 2011,5:26 PM,Jean-Daniel Cryans <[email protected]> > 写道: > >>>> > > > >>>> > >> Hey guys, > >>>> > >> > >>>> > >> I was doing some comparisons between 0.90.5 and 0.92.0, mainly > >>>> > >> regarding reads. The numbers are kinda irrelevant but the > differences > >>>> > >> are. BTW this is on CDH3u3 with random reads. > >>>> > >> > >>>> > >> In 0.90.0, scanning 50M rows that are in the OS cache I go up to > about > >>>> > >> 1.7M rows scanned per second. > >>>> > >> > >>>> > >> In 0.92.0, scanning those same rows (meaning that I didn't run > >>>> > >> compactions after migrating so it's picking the same data from > the OS > >>>> > >> cache), I scan about 1.1 rows per second. > >>>> > >> > >>>> > >> 0.92 is 50% slower when scanning. > >>>> > >> > >>>> > >> In 0.90.0 random reading 50M rows that are OS cached I can do > about > >>>> > >> 200k reads per second. > >>>> > >> > >>>> > >> In 0.92.0, again with those same rows, I can go up to 260k per > second. > >>>> > >> > >>>> > >> 0.92 is 30% faster when random reading. > >>>> > >> > >>>> > >> I've been playing with that data set for a while and the numbers > in > >>>> > >> 0.92.0 when using HFileV1 or V2 are pretty much the same meaning > that > >>>> > >> something else changed or the code that's generic to both did. > >>>> > >> > >>>> > >> > >>>> > >> I'd like to be able to associate those differences to code > changes in > >>>> > >> order to understand what's going on. I would really appreciate if > >>>> > >> others also took some time to test it out or to think about what > could > >>>> > >> cause this. > >>>> > >> > >>>> > >> Thx, > >>>> > >> > >>>> > >> J-D > >>>> > > >>>> >
