[jira] Commented: (HBASE-867) If millions of columns in a column family, hbase scanner won't come up

Ben Maurer (JIRA) Thu, 19 Feb 2009 20:16:28 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675234#action_12675234
 ]


Ben Maurer commented on HBASE-867:
----------------------------------

I confirmed that this code is what was causing crashes for me. What happened is 
that I had a MR job that would launch multiple scanners on a region that made 
updates to the same column family as they were scanning on (but not the same 
column). As a result, there were lots of processes that had to grep through all 
of the irrelevent inserts many times as flushes occurred.

I think that this case could be fixed in 0.19.0, and furthermore I think the 
fix might actually clean up the code a lot:

(10:58:18 PM) BenM: yeah
(10:58:21 PM) BenM: was just doing that
(10:58:30 PM) BenM: IMHO, this is a somewhat easier issue to fix
(10:58:38 PM) BenM: i think it could be done in a way that cleans up the code
(10:58:50 PM) BenM: right now, the code just scans through each of the map files
(10:59:02 PM) BenM: without regard to the relative key positions
(10:59:12 PM) BenM: i think it could use a priority queue so that it only works 
on the relevent files
(11:01:22 PM) St^Ack_: BenM: please expand, I don't follow exactly
(11:01:50 PM) BenM: lets say we have two map files
(11:02:09 PM) BenM: one with 1/foo:bar 2/foo:bar 3/foo:bar
(11:02:17 PM) BenM: (row/family:col)
(11:02:31 PM) BenM: and the other with 1000/blah:blah 1001/blah:blah
(11:02:39 PM) BenM: the curent logic is
(11:02:44 PM) BenM: for each map file:
(11:02:56 PM) BenM:    find the first potential row in this file
(11:03:08 PM) BenM: look at min(all potential rows)
(11:03:34 PM) BenM: the algorith should be:
(11:03:43 PM) BenM: q = new PriorityQueue()
(11:04:05 PM) BenM: for each map file: insert the HStoreKey of the first key in 
the file
(11:04:17 PM) BenM: while(k = q.pop()) {
(11:04:37 PM) BenM:   if (k is intersting) break;
(11:04:37 PM) BenM:   advance k
(11:04:37 PM) BenM:   q.push(k)
(11:04:38 PM) BenM: }
(11:05:00 PM) BenM: that way, we don't try to find a matching key in the larger 
rows

> If millions of columns in a column family, hbase scanner won't come up
> ----------------------------------------------------------------------
>
>                 Key: HBASE-867
>                 URL: https://issues.apache.org/jira/browse/HBASE-867
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.0
>
>
> Our Daniel has uploaded a table that has a column family with millions of 
> columns in it.  He can get items from the table promptly specifying row and 
> column.  Scanning is another matter.  Thread dumping I see we're stuck in the 
> scanner constructor nexting through cells.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-867) If millions of columns in a column family, hbase scanner won't come up

Reply via email to