[
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980183#comment-13980183
]
Colin Patrick McCabe commented on HDFS-5851:
--------------------------------------------
I took a quick look at the design doc. I think the focus on "discardable"
memory makes sense in light of next-gen frameworks like Spark, Tez, etc. One
note: Tachyon, Spark's caching layer, does not currently incorporate the
concept of RDDs, although that support is planned, as I understand. It's just
caching (serialized) files at this point, and I think the semantics match up
pretty well with what we're talking about here. The execution framework can
re-generate the data if needed... this re-generating support does not need to
be included in HDFS.
I think that some HDFS applications will want the ability to treat multiple
files as a single eviction unit... i.e., if you evict one file, you evict them
all. (Things like Hive tables are multiple files, but probably ought to be
treated as a single unit for caching purposes.) There are also some questions
about when eviction can occur... it seems like it would be very inconvenient to
do it while the file was being read. On the other hand, we probably need a
timeout to prevent a selfish process (or a process on a disconnected node) from
pinning something in the cache forever by keeping a file open.
Clearly we want the ability to do things like skip checksums when reading the
cached files. This will reuse a lot of the HDFS-4949 code. It's less clear
what other aspects of the HDFS-4949 code we'll want to reuse. I think cache
pools might be one such thing. There is a potential to reuse some of the
implementation as well, such as mlocking and so forth. An mlocked file in
/dev/shm could be a good way to go here.
I am free all of next week, except for Friday. Let's schedule a webex so we
can figure this stuff out.
> Support memory as a storage medium
> ----------------------------------
>
> Key: HDFS-5851
> URL: https://issues.apache.org/jira/browse/HDFS-5851
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: datanode
> Affects Versions: 3.0.0
> Reporter: Arpit Agarwal
> Assignee: Arpit Agarwal
> Attachments:
> SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf
>
>
> Memory can be used as a storage medium for smaller/transient files for fast
> write throughput.
> More information/design will be added later.
--
This message was sent by Atlassian JIRA
(v6.2#6252)