[jira] [Comment Edited] (HDFS-4949) Centralized cache management in HDFS

Suresh Srinivas (JIRA) Tue, 06 Aug 2013 12:27:46 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13731143#comment-13731143
 ]


Suresh Srinivas edited comment on HDFS-4949 at 8/6/13 7:24 PM:
---------------------------------------------------------------

My notes from the meeting:
Enabling this feature on windows platform requires the following:
# Need Unix Domain sockets equivalent
# mmap and munmap is done using java and should not require any windows 
specific changes
# mlock there is no windows equivalent?

Quota for datanode cache is counted against pool

Design needs to cover the following scenarios in more detail:
# Two pools caching the same file and how is quota counted
# Resource failures and how it affects existing caches for the pools. Perhaps 
pools should have priorities.
#* scenario 1 - resource failure takes down cached data. In the first cut, no 
new cached replicas will be created.
#* scenario 2 - resources failed and cluster capacity is low, then the 
application even if higher priority will not get cache quota.
# Caching supported for whole file for now.
# Only completed blocks will be cached. This is true for files that are being 
written.
# symlink paths will not be cached
# Need to add more details on enabling cache for a directory and how the newly 
created files (on completion of write) will be added to the cache. This also 
has quota implications and need for handling failures related to either 
reaching quota or non-availability of resources for such automatic caching to 
work.

We should add TTL for caching request and expire the cache.

Lets refresh the design document based on discussions from the meeting.

                
      was (Author: sureshms):
    My notes from the meeting:
Enabling this feature on windows platform requires the following:
# Need Unix Domain sockets equivalent
# mmap and munmap is done using java and should not require any windows 
specific changes
# mlock there is no windows equivalent?

Quota for datanode cache is counted against pool

Design needs to cover the following scenarios in more detail:
# Two pools caching the same file and how is quota counted
# Resource failures and how it affects existing caches for the pools. Perhaps 
pools should have priorities.
#* scenario 1 - resource failure takes down cached data. In the first cut, no 
new cached replicas will be created.
#* scenario 2 - resources failed and cluster capacity is low, then the 
application even if higher priority will not get cache quota.
# Caching supported for whole file for now.
# Only completed blocks will be cached. This is true for files that are being 
written.
# symlink paths will not be cached
# Need to add more details on enabling cache for a directory and how the newly 
created files (on completion of write) will be added to the cache. This also 
has quota implications and need for handling failures related to either 
reaching quota or non-availability of resources for such automatic caching to 
work.

We should add TTL for caching request and expire the cache.

I think we should refresh the design document based on discussions from the 
discussions.

                  
> Centralized cache management in HDFS
> ------------------------------------
>
>                 Key: HDFS-4949
>                 URL: https://issues.apache.org/jira/browse/HDFS-4949
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>    Affects Versions: 3.0.0, 2.3.0
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>         Attachments: caching-design-doc-2013-07-02.pdf
>
>
> HDFS currently has no support for managing or exposing in-memory caches at 
> datanodes. This makes it harder for higher level application frameworks like 
> Hive, Pig, and Impala to effectively use cluster memory, because they cannot 
> explicitly cache important datasets or place their tasks for memory locality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HDFS-4949) Centralized cache management in HDFS

Reply via email to