Re: map reduce range of records from hbase table

stack Fri, 10 Oct 2008 21:20:04 -0700

Jaeyun Noh wrote:

I wonder if the network RPC involves whenever we call next() if scanner
class.

Its not a pretty story. A next in client makes for a trip over to theserver carrying the region that hosts the row the scanner is currentlystalled on. Serverside, the region has a Scanner context that haswithin it a scanner on the memcache and then a scanner for each of thestorefiles present in the filesystem. The storefile scanners in turnreduce to Hadoop MapFile#next calls so another network hop is involvedout to the particular datanode hosting the MapFile block the scanner iscurrently within. The next on the serverside is a careful nextingthrough the memcache first and through each of the store filesrespecting order trying to turn up appropriate next result.

Also if the scanner works as a manner of parallel-request to Hregions and
fetch to temporary cache of Hbase clients.

Well, scanner will be homed on a single row at a time only so will beagainst a single region only at any one time. That said, at the moment,if a row comprises many column families, we currently proceed througheach in series. I believe there is an issue to parallelize the requestsacross all the column families in a row.

If so, we're happy to live with that.

Is the following hbase parameter related to my question?

<property>

    <name>hbase.client.scanner.caching</name>

    <value>30</value>

    <description>Number of rows that will be fetched when calling next

    on a scanner if it is not served from memory. Higher caching values

    will enable faster scanners but will eat up more memory and some

    calls of next may take longer and longer times when the cache is empty.

    </description>

  </property>

Yes. Just added. Fetches a bunch at a time rather than one at a timeas it used to. Was just added. In my testing, makes scanners 4X faster.


St.Ack

Re: map reduce range of records from hbase table

Reply via email to