[jira] [Commented] (HBASE-7474) Endpoint Implementation to support Scans with Sorting of Rows based on column values(similar to "order by" clause of RDBMS)

Ted Yu (JIRA) Tue, 08 Jan 2013 13:44:14 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13547313#comment-13547313
 ]


Ted Yu commented on HBASE-7474:
-------------------------------

License headers in SortingClient.java and 
BigDecimalSortingColumnInterpreter.java are not properly formatted.
Some log statements, such as the following, can be at debug level.
{code}
+      log.info("Querying only one region for sorting");
{code}

{code}
+            if (sortDecreasing) return instance.sortDecreasing(scan, 
columnFamily, columnQualifier,
+              colInterpreter, startIndex, pageSize, true);
+            else return instance.sortIncreasing(scan, columnFamily, 
columnQualifier,
{code}
'else' keyword is not needed above.
{code}
+   * This method is used to do the merge sort the rows from multiple regions 
and produce the final output
{code}
Remove 'do the'. Wrap long line.
{code}
+    for (Map.Entry<byte[], Result[]> regionResultsEntryMap : 
regionResultMap.entrySet()) {
{code}
regionResultsEntryMap -> regionResultsEntry or regionResultsMapEntry
{code}
+    if(totalNoOfRows < startIndex)
+    {
{code}
Normally left brace is on the same line as if statement. Insert a space between 
if and (.

currentMaxorMinValueRegion and maxOrMin are used in the if / else blocks. You 
can move them inside if / else block and give them names that are clearer in 
meaning.
{code}
+        for (Result[] regionResult : regionResults) {
+          if ((regionResult.length - 1) < arrayIndex[regionNum]) {
{code}
regionResults and arrayIndex are both arrays. So you can use the same index to 
access them - in my opinion the code is more readable.
{code}
+          finalResult[finalResultCurrentSize++] = 
regionResults[currentMaxorMinValueRegion][arrayIndex[currentMaxorMinValueRegion]];
{code}
Wrap long line above.
{code}
+          if (colInterpreter.compare(tmp, maxOrMin) > 0) {
{code}
If I read the code correctly, the above comparison is the major difference 
between ascending and descending sorting. A little abstraction would allow you 
to unify the two cases.

Looking at SortingColumnInterpreter, this is the only method which is not 
present in ColumnInterpreter:
{code}
+  T getValue(KeyValue kv) throws IOException;
{code}
We're trying to reduce exposure of KeyValue. Meaning the following method is 
favored:
{code}
  public abstract T getValue(byte[] colFamily, byte[] colQualifier, KeyValue kv)
      throws IOException;
{code}
Please consider removing SortingColumnInterpreter
                
> Endpoint Implementation to support Scans with Sorting of Rows based on column 
> values(similar to "order by" clause of RDBMS)
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7474
>                 URL: https://issues.apache.org/jira/browse/HBASE-7474
>             Project: HBase
>          Issue Type: New Feature
>          Components: Coprocessors, Scanners
>    Affects Versions: 0.94.3
>            Reporter: Anil Gupta
>            Priority: Minor
>              Labels: coprocessors, scan, sort
>             Fix For: 0.94.5
>
>         Attachments: hbase-7474.patch, hbase-7474-v2.patch, 
> SortingEndpoint_high_level_flowchart.pdf
>
>
> Recently, i have developed an Endpoint which can sort the Results(rows) on 
> the basis of column values. This functionality is similar to "order by" 
> clause of RDBMS. I will be submitting this Patch for HBase0.94.3
> I am almost done with the initial development and testing of feature. But, i 
> need to write the JUnits for this. I will also try to make design doc.
> Thanks,
> Anil Gupta
> Software Engineer II, Intuit, inc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7474) Endpoint Implementation to support Scans with Sorting of Rows based on column values(similar to "order by" clause of RDBMS)

Reply via email to