Hello,
I have a question about using the cached data in memory via centralized cache
management.
I cached
the data what I want to use through the CLI (hdfs cacheadmin
-addDirectives ...).
Then, when
I write my mapreduce application, how can I read the cached data in
memory?
Here
is the source code from my mapreduce application.
System.out.println("Ready
for loading data from Centralized Cache in DataNode");
System.out.println("Connecting
HDFS... at " + hdfsURI.toString());
DFSClient
dfs = new DFSClient(hdfsURI, new Configuration());
CacheDirectiveInfo
info =
new
CacheDirectiveInfo.Builder().setPath(new Path("path in HDFS for cached
data")).setPool("cache").build();
CacheDirectiveEntry
cachedFile = dfs.listCacheDirectives(info).next();
System.out.println("We
got cachedFile! ID: " +
cachedFile.getInfo().getId()
+ ", Path: " + cachedFile.getInfo().getPath() + ", CachedPool:
" + cachedFile.getInfo().getPool());
System.out.println("Open
DFSInputStream to read cachedFile to ByteBuffer");
DFSInputStream
in = dfs.open(cachedFile.getInfo().getPath().toString());
ElasticByteBufferPool
bufPool = new ElasticByteBufferPool();
ByteBuffer
buf = ByteBuffer.allocate(10000);
System.out.println("Generating
Off-Heap ByteBuffer! size: " + buf.capacity());
in.read(buf);
buf.flip();
// Flip: ready for reading data after writing data into buffer
System.out.println("Zero-Copying
cached file into buffer!");
Is
it right source code for using the centralized cache management feature?
Thanks
// Yoonmin Nam