Jaeyun Noh wrote:
I wonder if the network RPC involves whenever we call next() if scanner
class.

Its not a pretty story. A next in client makes for a trip over to the server carrying the region that hosts the row the scanner is currently stalled on. Serverside, the region has a Scanner context that has within it a scanner on the memcache and then a scanner for each of the storefiles present in the filesystem. The storefile scanners in turn reduce to Hadoop MapFile#next calls so another network hop is involved out to the particular datanode hosting the MapFile block the scanner is currently within. The next on the serverside is a careful nexting through the memcache first and through each of the store files respecting order trying to turn up appropriate next result.

Also if the scanner works as a manner of parallel-request to Hregions and
fetch to temporary cache of Hbase clients.

Well, scanner will be homed on a single row at a time only so will be against a single region only at any one time. That said, at the moment, if a row comprises many column families, we currently proceed through each in series. I believe there is an issue to parallelize the requests across all the column families in a row.


If so, we're happy to live with that.

Is the following hbase parameter related to my question?

<property>

    <name>hbase.client.scanner.caching</name>

    <value>30</value>

    <description>Number of rows that will be fetched when calling next

    on a scanner if it is not served from memory. Higher caching values

    will enable faster scanners but will eat up more memory and some

    calls of next may take longer and longer times when the cache is empty.

    </description>

  </property>
Yes. Just added. Fetches a bunch at a time rather than one at a time as it used to. Was just added. In my testing, makes scanners 4X faster.

St.Ack

Reply via email to