I remember someone objecting to the scheme. :) On Wed, Apr 23, 2008 at 6:46 AM, Clint Morgan <[EMAIL PROTECTED]> wrote: > The way I picture it, we only need to check the 100 or 1000 regions to > get the very first row of a order-by scan. Afterwards, we just need to > get the next row from the region that we took the last row from, > compare with the existing rows we have, and return the lowest. > > So if we have R regions to grab from, the first call to next() will > have to fetch all R rows from the regions and do R log R comparisons > to do the sort. Then each call to next() will cost 1 row fetch from > region plus log R comparisons to put in a sorted tree. > > > > On Tue, Apr 22, 2008 at 1:10 PM, Bryan Duxbury <[EMAIL PROTECTED]> wrote: > > If you just want the first 10 by a certain prefix ordered by a column, then > > you will definitely be better off scanning them by row and ordering them > > clientside. > > > > Surely your idea of maintaining a separate SortedMap in each region would > > work, but I don't think you should discount the cost associated with having > > to talk to a bunch of different regions every time you want to know what the > > next row is. You'll do a lot of extra work to get that merged view of the > > index, and potentially the approach won't scale up to queries that might > > cover more than a "few" regions - can you imagine having to check 100 or > > 1000 regions for the next result every time you needed to iterate? > > > > > > > > On Apr 22, 2008, at 12:58 PM, Clint Morgan wrote: > > > > > > > Yeah, that would be an easy approach. We would need HBASE-82. > > > > > > The main problem I see here is that we cannot take (as much) advantage > > > of our row key prefix in weeding out rows. > > > > > > Say I want the first 10 rows that start with XXX ordered by A:amount, > > > then I would have scan through column values from rows everywhere in > > > the table until I hit 10 with my prefix. Could be costly if table is > > > large compared to the number of rows that start with XXX. > > > > > > Whereas if we have one SortedMap per Region, then I can quickly narrow > > > down to (hopefully) a few regions based on key prefix. > > > > > > Though other usage / table loading patterns would surely benefit from > > > this approach... > > > > > > On Tue, Apr 22, 2008 at 12:31 PM, Bryan Duxbury <[EMAIL PROTECTED]> wrote: > > > > > > > This doesn't have to be all that complicated. > > > > > > > > Why not keep another HBase table as the index? The keys would be the > > column > > > > values. There'd be a single matchingRows column family, and the > > qualifiers > > > > and values would be the rows that match that column. Then, when you want > > to > > > > scan in column order instead of row order, you scan the index table, > > find > > > > the list of rows that match each column, and then do a random read to > > grab > > > > those individually. It'll for sure be slower than scanning a table > > ordered > > > > by rows, but it'll get you what you want. It'll also handle the case > > where > > > > the column values aren't unique. > > > > > > > > If you need custom sorting for those values, then HBASE-82 would solve > > that > > > > problem. > > > > > > > > -Bryan > > > > > > > > > > > > > > > > On Apr 22, 2008, at 11:58 AM, stack wrote: > > > > > > > > > > > > > > > > > Some questions interlaced below: > > > > > > > > > > Clint Morgan wrote: > > > > > > > > > > > > > > > > All, > > > > > > > > > > > > We want to put secondary indexes into hbase. The motivation is that > > we > > > > > > are storing data in hbase that we want to serve to users. We would > > > > > > like to be able to serve rows sorted by column values. Our queries > > > > > > will be over rows with a given key prefix, so we should not be > > hitting > > > > > > to many regions. > > > > > > I was thinking it would work roughly like this: > > > > > > > > > > > > - At table creation time, individual columns can be declared as > > > > > > indexed. By default we could sort the column values > > lexicographically, > > > > > > or we can provide a WritableComparatorFactory<T> which has the > > ability > > > > > > to make values of type T from a byte [], as well as providing a > > > > > > Comparator<T>. (Better than providing a Comparator<byte[]> as it > > only > > > > > > costs once per row insert for deserialization, rather that twice on > > > > > > each comparison). > > > > > > > > > > > > > > > > > > > > > > > > > > > > I don't follow what the Factory adds. > > > > > > > > > > We're talking about getting HBASE-82 into 0.2. Does that interfere > > with > > > > > > > > > this proposal? (I'm thinking that along w/ rows becoming byte arrays > > rather > > > > Text with a client-supplied Comparator, column qualifiers would shift to > > be > > > > byte arrays also -- though yeah, implies that if your sort is not > > > > byte-lexicographical, yes, the compares can be costly involving two > > > > deserializations per compare). > > > > > > > > > > > > > > > > > > > > - We catch all writes/deletes and maintain a SortedMap<T, HStoreKey> > > > > > > which keeps the column values in order, and maps them back to row > > > > > > keys. First cut may just keep all this in memory, but it should be > > > > > > backed with MapFile(s). > > > > > > > > > > > > > > > > > > > > > > > > > > > > Would be sweet if you could leverage the HBase memcache code and > > flusher > > > > > > > > > to do the above. > > > > > > > > > > > > > > This Map would be global for the table? Or per Region? > > > > > > > > > > A lucene index wouldn't work for you because you want ordering? > > > > > > > > > > > > > > > > > > > > > - Add to the hregion the ability to scan through keys in column > > order. > > > > > > Just iterate through the SortedMap, run a filter on the key, and if > > it > > > > > > passes do a get on the row. > > > > > > > > > > > > > > > > > > > > > > > > > > > > You'd be random reading rows. You're OK w/ current performance? (For > > > > > > > > > sure it will only improve but....). > > > > > > > > > > > > > > > > > > > > > > > > > - Add a ColumnOrderedClientScanner which will open column order > > > > > > scanners to all applicable hregions, and continuously pick row with > > > > > > the lowest column value from each of the client scanners. > > > > > > > > > > > > > > > > > > > > > > > > > > > > This scanner would have a significant client-side component to do the > > > > > > > > > arbitrage between all regions to figure the lowest column value? If you > > had > > > > a new type of 'region' -- one denoted by lowest and upper column then > > the > > > > client-side logic would fade away and your scanner would look like > > current > > > > scanners. > > > > > > > > > > > > > > > > > > > > - Region splits should be easy enough, just a scan through the > > > > > > SortedMap to partition. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Splits would not be row-based and run as they currently do, but rather > > > > > > > > > sorted-column based? > > > > > > > > > > > > > > > > > > > > > > > > > Of course, the index could also be used for more efficient querying > > on > > > > > > the indexed column's values. > > > > > > > > > > > > Do other users have a need for this functionality? > > > > > > > > > > > > What do developers think about this? I know hbase is more intended > > for > > > > > > back-end batch style processing, but we have this need. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > How are you thinking of adding in this new functionality? Subclassing > > > > > > > > > HRegionServer? > > > > > > > > > > > > > > St.Ack > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > -clint > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
-- B. Regards, Edward J. Yoon
