Andy (Thalmann) recently sent me the following commit.  It solves a problem
he's seeing and wanted to know if I'd be willing to pull it in.  He's
allowed me to respond here on the mailing list so that others could benefit
and/or offer an opinion.

https://github.com/andysoftdev/hypertable/commit/d69f4ad274bccc8c9971e70a31ec705b2b1867d5

In a nutshell, this commit is an optimization for the case where you're
wanting to query 10,000 or 100,000 rows.  It minimizes the number of network
roundtrips the client needs to do (currently one per row) by doing a full
table scan and passing the entire set of rows to each range scan and
filtering out the matching rows.  This reduces the number of network round
trips to the number of ranges in a table.

I think the ideal solution would do the following:

1. Only involve the set of ranges required to cover the row set being
queried
2. For each range involved in the query, either do a set of random lookups,
or scan the entire range, depending on how much coverage the set of queries
has over the range.  For example, if a particular range only contains
(potentially) a single row, then it would be inefficient to scan the entire
range just to filter out that single row.

This would involve a fair amount of work on the client side (TableScanner
and IntervalScanner) as well as some work in the RangeServer.  If this is
too much work for you right now, we could probably do something more along
the lines of your existing commit, but do it in such a way that is more
in-line with the ideal solution.  The concern I have with your existing
commit is the added "rowset" member of the ScanSpec.  I think it would be
better to use the existing "row_intervals" member and add a
"scan_and_filter_rows" boolean.  If you don't want to handle the interval
case right now, if the scan_and_filter_rows flag is set, you could sanity
check the row_intervals to make sure it included exact row matches only.  By
using the existing "row_intervals" member, it would make it a lot easier to
plumb the change through the ThriftBroker and HQL.

- Doug

-- 
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en.

Reply via email to