Re: Review Request: Inexpensive reseek operations (1517) and filter based scanning (2904)

stack Mon, 09 Aug 2010 21:04:45 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/574/#review794
-----------------------------------------------------------




trunk/src/main/java/org/apache/hadoop/hbase/filter/ColumnPrefixFilter.java
<http://review.cloudera.org/r/574/#comment2669>

    I understand why you did this new filter Pranav, and its the right way to 
introduce the notion, but at same time I can't help thinking that we should 
just eat it and make this functionality part of the read path.   How about we 
get this patch in as is and then come back to integrate more directly in a new 
issue?



trunk/src/main/java/org/apache/hadoop/hbase/filter/ColumnPrefixFilter.java
<http://review.cloudera.org/r/574/#comment2674>

    Can you add better explaination of what this does?  This is the only 
documentation on this filter.  Provide an example in the javadoc?



trunk/src/main/java/org/apache/hadoop/hbase/filter/Filter.java
<http://review.cloudera.org/r/574/#comment2670>

    This is SEEK_NEXT_KEY_USING_FILTER?  Would suggest you not shorten the key 
name.



trunk/src/main/java/org/apache/hadoop/hbase/filter/Filter.java
<http://review.cloudera.org/r/574/#comment2671>

    This will never cross rows, will it?
    
    If so, filters can't carry state across rows (Ask me why if you need a why 
on this).



trunk/src/main/java/org/apache/hadoop/hbase/filter/FilterBase.java
<http://review.cloudera.org/r/574/#comment2672>

    .. that, by default, returns a null....



trunk/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java
<http://review.cloudera.org/r/574/#comment2673>

    Does FilterList inherit from FilterBase?  If so, is this nextKey 
implementation needed?



trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileScanner.java
<http://review.cloudera.org/r/574/#comment2675>

    I'm not sure I follow... a bit more clarity in the comment would help (This 
looks like Ryan-speak!)



trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileScanner.java
<http://review.cloudera.org/r/574/#comment2676>

    How we know there are keys after this one?



trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
<http://review.cloudera.org/r/574/#comment2677>

    So, we seeking to the next column on this row or on the next?  If one 
current row, isn't it rare that the next key will NOT be the next column?


- stack


On 2010-08-08 17:01:04, Pranav Khaitan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/574/
> -----------------------------------------------------------
> 
> (Updated 2010-08-08 17:01:04)
> 
> 
> Review request for hbase, stack, Jonathan Gray, Ryan Rawson, Karthik 
> Ranganathan, and Kannan Muthukkaruppan.
> 
> 
> Summary
> -------
> 
> What this patch includes:
> 1. Reseek framework. The ability to reseek to any position after having 
> seeked to some point in the file. To add this utility, changes were required 
> in all scanners.
> 2. The option for any filter to be able to tell the scanner which  key it 
> wants to go to next. Filters can be easily customized for different use-cases 
> without affecting the main read path. Since filters are optional, they do not 
> add any overhead for users who do not take advantage of it.
> 3. ColumnPrefixFilter: This filter serves the purpose of selecting keys with 
> columns having a specified prefix. The filter takes advantage of theability 
> to pass keys to the scanner to tell which key it should seek to next.
> 4. This also gives the option to seek directly to the required columns using 
> reseek mechanism (HBASE-2450). However, it needs to be decided if that 
> feature should be made optional using a filter or should it be added to the 
> read path to be used by everyone. Did not include it in this patch since it 
> required further discussions and testing.
> 5. Small changes to ScanQueryMatcher to return more specific return codes.
> 
> For HFile and reseek, the modifications were done after discussions with Ryan 
> and he had also written some code for this patch. For ScanQueryMatcher and 
> Filters, discussions were held with Jonathan, Karthik and Kannan.
> 
> This is big as it touches 21 files. Some issues, particularly those related 
> to HFiles and reseek implementations, are tricky and require closer reviews.
> 
> Modifications which couldn't be done within the scope of this work and should 
> be considered in future:
> 1. Ability to seek directly to next row. Currently, since we do not know what 
> the next row is, we also cannot pass a key to the scanner to go to that 
> position. 
> 2. Refactoring ScanQueryMatcher: The match function has become significantly 
> big and can be broken down into simpler components.
> 
> 
> This addresses bugs HBASE-1517, HBASE-2903 and HBASE-2904.
>     http://issues.apache.org/jira/browse/HBASE-1517
>     http://issues.apache.org/jira/browse/HBASE-2903
>     http://issues.apache.org/jira/browse/HBASE-2904
> 
> 
> Diffs
> -----
> 
>   trunk/src/main/java/org/apache/hadoop/hbase/filter/ColumnPrefixFilter.java 
> PRE-CREATION 
>   trunk/src/main/java/org/apache/hadoop/hbase/filter/Filter.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/filter/FilterBase.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java 
> 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 
> 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileScanner.java 
> 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 
> 983321 
>   
> trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 
> 983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 
> 983321 
>   
> trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MinorCompactingStoreScanner.java
>  983321 
>   
> trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
>  983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 983321 
>   
> trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
>  983321 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 
> 983321 
>   
> trunk/src/test/java/org/apache/hadoop/hbase/filter/TestColumnPrefixFilter.java
>  PRE-CREATION 
>   trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestReseekTo.java 
> PRE-CREATION 
>   
> trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java
>  983321 
>   
> trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java
>  983321 
>   
> trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java
>  983321 
> 
> Diff: http://review.cloudera.org/r/574/diff
> 
> 
> Testing
> -------
> 
> Added tests at HFileScanner and Filter/RegionScanner levels. The time taken 
> for running these tests is very less. All existing tests pass successfully. 
> Performance benchmarking was done and significant gains in performance can be 
> seen for corresponding use-cases.
> 
> 
> Thanks,
> 
> Pranav
> 
>

Re: Review Request: Inexpensive reseek operations (1517) and filter based scanning (2904)

Reply via email to