Re: Question about HFile seeking

Stack Thu, 16 May 2013 14:51:03 -0700

What is your query?

If scanning over rows of 100k, yeah, you will go through each row's content
unless you specify you are only interested in some subset of the rows.
 Then a 'skipping' facility will cut where we will use the index to skip
over unwanted content.


St.Ack



On Thu, May 16, 2013 at 2:42 PM, Varun Sharma <va...@pinterest.com> wrote:

> Nothing, I am just curious...
>
> So, we will do a bunch of wasteful scanning - that's lets say row1 has col1
> - col100000 - basically 100K columns, we will scan all those key values
> even though we are going to discard them, is that correct ?
>
>
> On Thu, May 16, 2013 at 2:30 PM, Stack <st...@duboce.net> wrote:
>
> > What you seeing Varun (or think you are seeing)?
> > St.Ack
> >
> >
> > On Thu, May 16, 2013 at 2:30 PM, Stack <st...@duboce.net> wrote:
> >
> > > On Thu, May 16, 2013 at 2:03 PM, Varun Sharma <va...@pinterest.com>
> > wrote:
> > >
> > >> Or do we use some kind of demarcator b/w rows and columns and
> timestamps
> > >> when building the HFile keys and the indices ?
> > >>
> > >
> > > No demarcation but in KeyValue, we keep row, column family name, column
> > > family qualifier, etc., lengths and offsets so the comparators on ly
> > > compare pertinent bytes.
> > >
> > > If you doing a prefix scan w/ row1c, we should be starting the scan at
> > > row1c, not row1 (or more correctly at the row that starts the block we
> > > believe has a row1c row in it...).
> > >
> > > St.Ack
> > >
> >
>

Re: Question about HFile seeking

Reply via email to