[
https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13731143#comment-13731143
]
Suresh Srinivas edited comment on HDFS-4949 at 8/6/13 7:24 PM:
---------------------------------------------------------------
My notes from the meeting:
Enabling this feature on windows platform requires the following:
# Need Unix Domain sockets equivalent
# mmap and munmap is done using java and should not require any windows
specific changes
# mlock there is no windows equivalent?
Quota for datanode cache is counted against pool
Design needs to cover the following scenarios in more detail:
# Two pools caching the same file and how is quota counted
# Resource failures and how it affects existing caches for the pools. Perhaps
pools should have priorities.
#* scenario 1 - resource failure takes down cached data. In the first cut, no
new cached replicas will be created.
#* scenario 2 - resources failed and cluster capacity is low, then the
application even if higher priority will not get cache quota.
# Caching supported for whole file for now.
# Only completed blocks will be cached. This is true for files that are being
written.
# symlink paths will not be cached
# Need to add more details on enabling cache for a directory and how the newly
created files (on completion of write) will be added to the cache. This also
has quota implications and need for handling failures related to either
reaching quota or non-availability of resources for such automatic caching to
work.
We should add TTL for caching request and expire the cache.
Lets refresh the design document based on discussions from the meeting.
was (Author: sureshms):
My notes from the meeting:
Enabling this feature on windows platform requires the following:
# Need Unix Domain sockets equivalent
# mmap and munmap is done using java and should not require any windows
specific changes
# mlock there is no windows equivalent?
Quota for datanode cache is counted against pool
Design needs to cover the following scenarios in more detail:
# Two pools caching the same file and how is quota counted
# Resource failures and how it affects existing caches for the pools. Perhaps
pools should have priorities.
#* scenario 1 - resource failure takes down cached data. In the first cut, no
new cached replicas will be created.
#* scenario 2 - resources failed and cluster capacity is low, then the
application even if higher priority will not get cache quota.
# Caching supported for whole file for now.
# Only completed blocks will be cached. This is true for files that are being
written.
# symlink paths will not be cached
# Need to add more details on enabling cache for a directory and how the newly
created files (on completion of write) will be added to the cache. This also
has quota implications and need for handling failures related to either
reaching quota or non-availability of resources for such automatic caching to
work.
We should add TTL for caching request and expire the cache.
I think we should refresh the design document based on discussions from the
discussions.
> Centralized cache management in HDFS
> ------------------------------------
>
> Key: HDFS-4949
> URL: https://issues.apache.org/jira/browse/HDFS-4949
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode, namenode
> Affects Versions: 3.0.0, 2.3.0
> Reporter: Andrew Wang
> Assignee: Andrew Wang
> Attachments: caching-design-doc-2013-07-02.pdf
>
>
> HDFS currently has no support for managing or exposing in-memory caches at
> datanodes. This makes it harder for higher level application frameworks like
> Hive, Pig, and Impala to effectively use cluster memory, because they cannot
> explicitly cache important datasets or place their tasks for memory locality.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira