[
https://issues.apache.org/jira/browse/HDFS-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jakob Homan updated HDFS-1353:
------------------------------
Summary: Remove most of getBlockLocation optimization (was: Optimize
number of block access tokens returned by getBlockLocations)
Description:
<This description is not valid. See comment.>
HDFS-1081 optimized the number of block access tokens (BATs) created in a
single call to getBlockLocations, as this is an expensive operation. However,
that JIRA put off another optimization which was then made possible, which is
to just send a single block access token across the wire (and maintain a single
BAT on the client side). This JIRA is for implementing that optimization.
Since a single BAT is generated for all the blocks, we just write that single
BAT to the wire, rather than writing n BATs for n blocks, as is currently done.
This turns out to be a useful optimization for files with very large numbers
of blocks, as the new lone BAT is much larger than was a BAT previously.
was:HDFS-1081 optimized the number of block access tokens (BATs) created in a
single call to getBlockLocations, as this is an expensive operation. However,
that JIRA put off another optimization which was then made possible, which is
to just send a single block access token across the wire (and maintain a single
BAT on the client side). This JIRA is for implementing that optimization.
Since a single BAT is generated for all the blocks, we just write that single
BAT to the wire, rather than writing n BATs for n blocks, as is currently done.
This turns out to be a useful optimization for files with very large numbers
of blocks, as the new lone BAT is much larger than was a BAT previously.
While benchmarking this new patch, originally an addendum to HDFS-1081, we
determined that 1081's original benchmarks were in error. getBlockLocations
was not the culprit in the performance degradation. 1081 didn't do any damage
to speed, and with this addendum, actually does give some benefit for files
with moderate numbers of blocks (see to-be-attached benchmarks). However,
since getBL isn't really a slow method, these gains aren't worth the extra
complexity they introduce. I'll upload the on-the-wire optimization patch, in
case it becomes useful at some point, but I'm going to use this JIRA to roll
back most of 1081, excluding some byte-array allocating that we can easily
cache. ...sigh.
> Remove most of getBlockLocation optimization
> --------------------------------------------
>
> Key: HDFS-1353
> URL: https://issues.apache.org/jira/browse/HDFS-1353
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: name-node
> Affects Versions: 0.21.0
> Reporter: Jakob Homan
> Assignee: Jakob Homan
> Fix For: 0.21.1
>
> Attachments: Benchmarking results.xlsx, HDFS-1353-y20.patch
>
>
> <This description is not valid. See comment.>
> HDFS-1081 optimized the number of block access tokens (BATs) created in a
> single call to getBlockLocations, as this is an expensive operation.
> However, that JIRA put off another optimization which was then made possible,
> which is to just send a single block access token across the wire (and
> maintain a single BAT on the client side). This JIRA is for implementing
> that optimization. Since a single BAT is generated for all the blocks, we
> just write that single BAT to the wire, rather than writing n BATs for n
> blocks, as is currently done. This turns out to be a useful optimization for
> files with very large numbers of blocks, as the new lone BAT is much larger
> than was a BAT previously.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.