[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980741#comment-13980741
 ] 

eric baldeschwieler commented on HDFS-5851:
-------------------------------------------

The case of a local short circuit read having access to the open file is 
interesting...  does this pin the memory until the possibly misbehaved client 
process closes the socket / FD?

Single replicas?  Why would one want to triple replicate discardable memory?  
One should at least have the option to only keep a single local copy in HDFS.

If we can not prevent random access writes to DDM (we could presumably limit 
this in client API), then I don't think we can checksum or replicate until a 
file is closed.  My gut is delaying such until close is the right call...

How are discarded or lost (node fails) blocks / files handled?  Do the names 
remain in the NN and get reported in FSCK and other operations?  We want to be 
sure this doesn't add work to operators.  

Can we make these files transient like ZK ephemeral nodes?

Once one assumes you don't need to replicate discardable files, then one can 
think about allocating only an arena name (think directory) in the NN and then 
creating individual files only at the DN, limiting NN interaction.  This would 
be a lot faster.  (You could still have remote access via 
.../<ARENA>/<DN-NAME>/<name> style URLs.)  With this you could vastly reduce NN 
interactions, which is probably good for latency reduction and scalability.  
You could then imagine using this mechanism for MR / Tez / Spark shuffle files 
...  which has been a long term project goal...  Maybe we should break this 
idea out into another JIRA... ?  happy to chat if folks want to flesh this out.

Involving Yarn in HDFS resource management is interestingly circular.  Is this 
needed?  One would want the right abstraction to allow other solutions to be 
applied to Yarnless deployments.

> Support memory as a storage medium
> ----------------------------------
>
>                 Key: HDFS-5851
>                 URL: https://issues.apache.org/jira/browse/HDFS-5851
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>    Affects Versions: 3.0.0
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>         Attachments: 
> SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf
>
>
> Memory can be used as a storage medium for smaller/transient files for fast 
> write throughput.
> More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to