[
https://issues.apache.org/jira/browse/HBASE-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628374#action_12628374
]
stack commented on HBASE-867:
-----------------------------
To be clear, if thousands of columns plus -- i.e. a canonical usage -- hbase
does not work. Here is some of the problem code form StoreFileScanner#next:
{code}
...
while ((keys[i] != null)
&& (Bytes.compareTo(keys[i].getRow(), viableRow.getRow()) == 0)) {
// If we are doing a wild card match or there are multiple matchers
// per column, we need to scan all the older versions of this row
// to pick up the rest of the family members
if(!isWildcardScanner()
&& !isMultipleMatchScanner()
&& (keys[i].getTimestamp() != viableRow.getTimestamp())) {
break;
}
if (columnMatch(i)) {
// We only want the first result for any specific family member
if(!results.containsKey(keys[i].getColumn())) {
results.put(keys[i].getColumn(),
new Cell(vals[i], keys[i].getTimestamp()));
insertedItem = true;
}
} else {
// Content is sorted. If column no longer matches, break.
break;
}
if (!getNext(i)) {
closeSubScanner(i);
}
}
// Advance the current scanner beyond the chosen row, to
// a valid timestamp, so we're ready next time.
while ((keys[i] != null) &&
((Bytes.compareTo(keys[i].getRow(), viableRow.getRow()) <= 0)
|| (keys[i].getTimestamp() > this.timestamp)
|| (! columnMatch(i)))) {
getNext(i);
}
..
{code}
The whiles find next row by getting cells until the row does not match. If
many columns per row, then that can take for ever (as its doing in Daniel's
case). Need to have a file format that has an index that says where next row
is. An option would say whether to get to next row by nexting or instead
asking index.
> If millions of columns in a column family, hbase scanner won't come up
> ----------------------------------------------------------------------
>
> Key: HBASE-867
> URL: https://issues.apache.org/jira/browse/HBASE-867
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
>
> Our Daniel has uploaded a table that has a column family with millions of
> columns in it. He can get items from the table promptly specifying row and
> column. Scanning is another matter. Thread dumping I see we're stuck in the
> scanner constructor nexting through cells.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.