[jira] Commented: (HBASE-867) If millions of columns in a column family, hbase scanner won't come up

stack (JIRA) Thu, 04 Sep 2008 09:30:37 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628374#action_12628374
 ]


stack commented on HBASE-867:
-----------------------------

To be clear, if thousands of columns plus -- i.e. a canonical usage -- hbase 
does not work.  Here is some of the problem code form StoreFileScanner#next:

{code}
...
          while ((keys[i] != null)
              && (Bytes.compareTo(keys[i].getRow(), viableRow.getRow()) == 0)) {

            // If we are doing a wild card match or there are multiple matchers
            // per column, we need to scan all the older versions of this row
            // to pick up the rest of the family members
            if(!isWildcardScanner()
                && !isMultipleMatchScanner()
                && (keys[i].getTimestamp() != viableRow.getTimestamp())) {
              break;
            }

            if (columnMatch(i)) {              
              // We only want the first result for any specific family member
              if(!results.containsKey(keys[i].getColumn())) {
                results.put(keys[i].getColumn(), 
                    new Cell(vals[i], keys[i].getTimestamp()));
                insertedItem = true;
              }
            } else {
              // Content is sorted.  If column no longer matches, break.
              break;
            }

            if (!getNext(i)) {
              closeSubScanner(i);
            }
          }

          // Advance the current scanner beyond the chosen row, to
          // a valid timestamp, so we're ready next time.
          while ((keys[i] != null) &&
              ((Bytes.compareTo(keys[i].getRow(), viableRow.getRow()) <= 0)
                  || (keys[i].getTimestamp() > this.timestamp)
                  || (! columnMatch(i)))) {
            getNext(i);
          }
..
{code}

The whiles find next row by getting cells until the row does not match.  If 
many columns per row, then that can take for ever (as its doing in Daniel's 
case).  Need to have a file format that has an index that says where next row 
is.  An option would say whether to get to next row by nexting or instead 
asking index.


> If millions of columns in a column family, hbase scanner won't come up
> ----------------------------------------------------------------------
>
>                 Key: HBASE-867
>                 URL: https://issues.apache.org/jira/browse/HBASE-867
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>
> Our Daniel has uploaded a table that has a column family with millions of 
> columns in it.  He can get items from the table promptly specifying row and 
> column.  Scanning is another matter.  Thread dumping I see we're stuck in the 
> scanner constructor nexting through cells.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-867) If millions of columns in a column family, hbase scanner won't come up

Reply via email to