[
https://issues.apache.org/jira/browse/HDFS-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775611#comment-13775611
]
Chris Nauroth commented on HDFS-5191:
-------------------------------------
It looks like there are still some import-only changes in {{DFSUtil}}.
{code}
if (buffer == null) {
throw new UnsupportedOperationException("zero-copy reads " +
"were not available, and the ByteBufferPool did not provide " +
"us with a " + (useDirect ? " direct" : "indirect") +
"buffer.");
}
{code}
Minor nit: this exception message would have an extraneous space in the direct
case (i.e. "did not provide us with a direct"), and no space between the last
two words (i.e. "did not provide us with a indirectbuffer").
{code}
while (true) {
countingVisitor.reset();
mmapManager.visitEvictable(countingVisitor);
if (0 == countingVisitor.count) {
break;
}
}
{code}
This test would cause an infinite loop if a bug was introduced that left mmaps
lingering in evictable state, which could be hard to diagnose. Should we use
{{GenericTestUtils#waitFor}}, so that there is a timeout?
Lastly, can you help clarify something about the data structure behind
{{IdentityHashStore}} for me? In particular, I'm wondering about the math in
{{realloc}}:
{code}
private void realloc(int newCapacity) {
Preconditions.checkArgument(newCapacity > 0);
Object prevBuffer[] = buffer;
this.capacity = newCapacity;
int numElements = 1 + (2 * newCapacity);
this.buffer = new Object[2 * numElements];
this.numInserted = 0;
if (prevBuffer != null) {
for (int i = 0; i < prevBuffer.length; i += 2) {
if (prevBuffer[i] != null) {
putInternal(prevBuffer[i], prevBuffer[i + 1]);
}
}
}
}
{code}
{{put}} will call {{realloc}} to double capacity when needed. {{numElements}}
is double of the new capacity plus one extra. Then, the size is doubled again
for allocating the new array. Therefore the growth pattern looks like:
capacity == 2, buffer.length == 10
capacity == 4, buffer.length == 18
capacity == 8, buffer.length == 34
It looks like this causes a fair amount of extra unused slack in the array,
because only 2 * {{numInserted}} elements are used at a time. Is the slack
required, or should {{realloc}} only double the size once instead of twice?
Also, I wasn't sure why we needed 1 extra in the calculation of
{{numElements}}. Is that a defensive thing so that the loop in {{putInternal}}
always terminates? I would expect the check of capacity in {{put}} to be
sufficient to prevent that without needing an extra sentinel element.
On to the libhdfs changes...
> revisit zero-copy API in FSDataInputStream to make it more intuitive
> --------------------------------------------------------------------
>
> Key: HDFS-5191
> URL: https://issues.apache.org/jira/browse/HDFS-5191
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: hdfs-client, libhdfs
> Affects Versions: HDFS-4949
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HDFS-5191-caching.001.patch,
> HDFS-5191-caching.003.patch, HDFS-5191-caching.006.patch,
> HDFS-5191-caching.007.patch, HDFS-5191-caching.008.patch
>
>
> As per the discussion on HDFS-4953, we should revisit the zero-copy API to
> make it more intuitive for new users.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira