[jira] Commented: (HADOOP-1398) Add in-memory caching of data

stack (JIRA) Tue, 15 Jan 2008 12:06:01 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559208#action_12559208
 ]


stack commented on HADOOP-1398:
-------------------------------

Patch looks great Tom.

You pass 'length' in the below but its not used:

{code}
+    protected FSDataInputStream openFile(FileSystem fs, Path file,
+        int bufferSize, long length) throws IOException {
+      return fs.open(file, bufferSize);
{code}

I presume you have plans for it later?

You have confidence in the LruMap class?  You don't have unit tests (though 
these things are hard to test).  I ask because though small, sometimes these 
kinds of classes can prove a little tricky....

Do you have any numbers for how it improves throughput when cached blocks are 
'hot'?   And you talked of a slight 'cost'.  Do you have rough numbers for that 
too? (Playing on datanode adjusting the size of the CRC blocks, a similar type 
of blocking to what you have here, there was no discernable difference 
adjusting sizes).

What do we need to add to make it so its easy to enable/disable this feature on 
a per-column basis?  Currently edits to column config. requires taking column 
offline.  Changing this configuration looks safe-to-do while the column stays 
on line.  Would you agree?

> Add in-memory caching of data
> -----------------------------
>
>                 Key: HADOOP-1398
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1398
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>            Reporter: Jim Kellerman
>            Priority: Trivial
>         Attachments: hadoop-blockcache.patch
>
>
> Bigtable provides two in-memory caches: one for row/column data and one for 
> disk block caches.
> The size of each cache should be configurable, data should be loaded lazily, 
> and the cache managed by an LRU mechanism.
> One complication of the block cache is that all data is read through a 
> SequenceFile.Reader which ultimately reads data off of disk via a RPC proxy 
> for ClientProtocol. This would imply that the block caching would have to be 
> pushed down to either the DFSClient or SequenceFile.Reader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1398) Add in-memory caching of data

Reply via email to