[jira] Commented: (HADOOP-1398) Add in-memory caching of data

Tom White (JIRA) Wed, 16 Jan 2008 05:58:59 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559498#action_12559498
 ]


Tom White commented on HADOOP-1398:
-----------------------------------

bq. You pass 'length' in the below but its not used:

It is used in the subclass of SequenceFile.Reader by BlockFSInputStream.

bq. Do you have any numbers for how it improves throughput when cached blocks 
are 'hot'?

I haven't got any numbers yet (working on them), but random reads will suffer 
in general since a whole 64KB block is retrieved to just read a single 
key/value. The Bigtable paper talks about reducing the block size to 8KB (see 
section 7).

bq. What do we need to add to make it so its easy to enable/disable this 
feature on a per-column basis? Currently edits to column config. requires 
taking column offline. Changing this configuration looks safe-to-do while the 
column stays on line. Would you agree?

Agreed. I think that dynamically editing a column descriptor should go in a 
separate jira issue. For now, I was planning on just adding the new parameters 
to HColumnDescriptor. Does the version number need bumping in this case?

> Add in-memory caching of data
> -----------------------------
>
>                 Key: HADOOP-1398
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1398
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>            Reporter: Jim Kellerman
>            Priority: Trivial
>         Attachments: hadoop-blockcache.patch
>
>
> Bigtable provides two in-memory caches: one for row/column data and one for 
> disk block caches.
> The size of each cache should be configurable, data should be loaded lazily, 
> and the cache managed by an LRU mechanism.
> One complication of the block cache is that all data is read through a 
> SequenceFile.Reader which ultimately reads data off of disk via a RPC proxy 
> for ClientProtocol. This would imply that the block caching would have to be 
> pushed down to either the DFSClient or SequenceFile.Reader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1398) Add in-memory caching of data

Reply via email to