On Wed, May 15, 2013 at 1:20 PM, lars hofhansl <[email protected]> wrote:
> Do you have some more details? > Yes, the rows have 50 columns each when we use a wide schema. Unfortunately, this was a while back when we tried to go tall and found performance to be poor and eventually switched to wide. The reason why I say "unfortunately" is because I don't remember the exact performance numbers. Now we have a use case where we may have much wider rows (millions of columns) - so because of these outliars, we prefer tall. I probably should try reproducing the same test case again. We basically saw significantly more iowait and I/O with the tall schema v/s get schema as we upp'ed the load. > Why would a scan in a tall schema be all over the place but in a wide > schema it is not? > It is random in both cases - the scans are as random as the gets. Probably a mistake in my email below. > How wide were the rows before? About 50 columns? > Yes 50 columns or so (could be upto 100 but not much). > > > -- Lars > > > ----- Original Message ----- > From: Varun Sharma <[email protected]> > To: "[email protected]" <[email protected]> > Cc: > Sent: Wednesday, May 15, 2013 11:58 AM > Subject: Re: Where is scanner startRow used > > Yeah i just checked that we were already using startRow and its still > significantly poorer performance than the wide schema (close to unusable) > > We are doing scans of 50 batch size but the scans are all over the place - > very random because the schema is tall and not wide. I have created a JIRA > for the same and I will report performance numbers there. But to me, not > seeking to the start row within a region feels clearly suboptimal. > > Thanks > Varun > > > On Wed, May 15, 2013 at 11:48 AM, Anoop John <[email protected]> > wrote: > > > At client side see ScannerCallable where this is passed to > > ServerCallable.. Based on this only which regions should be 1st scanned > is > > decided.. > > I think some time back also the prefix filter was discussed. At that time > > also the conclusion was to use the start row. U can set a start row now > > right? Pls check the perf with this once. > > > > -Anoop- > > > > > > On Thu, May 16, 2013 at 12:02 AM, Varun Sharma <[email protected]> > > wrote: > > > > > Hi, > > > > > > Could someone please point me to where Scan.startRow is being used ? > > > > > > From what I can see in HRegion.RegionScannerImpl, it is unused. A grep > > does > > > not seem to return any valid entries. But my knowledge of this part is > > > limited. > > > > > > We are debugging poor performance on prefix scans in tall schemas. If > > this > > > is really an issue, I will open a JIRA... > > > > > > Varun > > > > > > >
