[
https://issues.apache.org/jira/browse/HDDS-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HDDS-4552:
---------------------------------
Labels: pull-request-available (was: )
> Read data from chunk into ByteBuffer[] instead of single ByteBuffer
> -------------------------------------------------------------------
>
> Key: HDDS-4552
> URL: https://issues.apache.org/jira/browse/HDDS-4552
> Project: Hadoop Distributed Data Store
> Issue Type: Improvement
> Reporter: Hanisha Koneru
> Assignee: Hanisha Koneru
> Priority: Major
> Labels: pull-request-available
>
> When a ReadChunk operation is performed, all the data to be read from one
> chunk is read into a single ByteBuffer.
> {code:java}
> #ChunkUtils#readData()
> public static void readData(File file, ByteBuffer buf,
> long offset, long len, VolumeIOStats volumeIOStats)
> throws StorageContainerException {
> .....
> try {
> bytesRead = processFileExclusively(path, () -> {
> try (FileChannel channel = open(path, READ_OPTIONS, NO_ATTRIBUTES);
> FileLock ignored = channel.lock(offset, len, true)) {
> return channel.read(buf, offset);
> } catch (IOException e) {
> throw new UncheckedIOException(e);
> }
> });
> } catch (UncheckedIOException e) {
> throw wrapInStorageContainerException(e.getCause());
> }
> .....
> .....{code}
> This Jira proposes to read the data from the channel and put it into an array
> of ByteBuffers each with a set capacity. This capacity can be configurable.
> This would help with optimizing Ozone InputStreams in terms of cached memory.
> Currently, data in ChunkInputStream is cached till either the stream is
> closed or the chunk EOF is reached. This sometimes leads to upto 4MB (default
> ChunkSize) of data being cached in memory per ChunkInputStream.
> After the proposed change, we can optimize ChunkInputStream to release a
> ByteBuffer as soon as that ByteBuffer is read instead of waiting to read the
> whole chunk (HDDS-4553). Read I/O performance will not be affected as the
> read from DN still returns the requested length of data at one go. Only
> difference would be that the data would be returned in an array of ByteBuffer
> instead of a single ByteBuffer.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]