[
https://issues.apache.org/jira/browse/HDFS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494760#comment-14494760
]
Colin Patrick McCabe commented on HDFS-8088:
--------------------------------------------
bq. I re-ran my test on Hadoop-2.7.1-SNAP with your patch applied, Colin, and
things are much happier. The performance is much closer to what I previously
saw with 2.6.0 (without any quantitative measurements). +1 (non-binding, ofc)
Thanks, Josh. I discovered that we are reading non-trivial amounts of remote
data inside the {{DFSInputStream#blockSeekTo}} method, so I think we'll also
need to create a trace span for that one. Also, the {{BlockReader}} trace
scopes will need to use the {{DFSClient#traceSampler}} (currently they don't)
or else we will never get any trace spans from reads. I think that is what we
would need to get the patch on this JIRA committed.
bq. Giving a very quick look at the code (and making what's possible a bad
guess), perhaps all of the 0ms length spans (denoted by zeroCount in the above,
as opposed to the nonzeroCount) are when DFSOutputStream#writeChunk is only
appending data into the current packet and not actually submitting that packet
for the data streamer to process? With some more investigation into the
hierarchy, I bet I could definitively determine that.
Keep in mind that doing a write in HDFS just hands the data off to a background
thread called {{DataStreamer}}. which writes it out asynchronously. The only
reason why {{writeChunk}} would ever have a time much higher than 0 is that
there was lock contention (the {{DataStreamer#waitAndQueuePacket}} method
couldn't get the {{DataStreamer#dataQueue}} lock immediately) or that there
were more than {{dfs.client.write.max-packets-in-flight}} unacked messages in
flight already. (HDFS calls "messages" by the name of "packets" even though
each message is typically multiple ethernet packets.)
I guess we have to step back and ask what the end goal is for HTrace. If the
end goal is figuring out why some requests had a high latency, it makes sense
to only trace parts of the program that we think will take a non-trivial amount
of time. In that case, we should probably only trace the handoff of the full
packet to the {{DataStreamer}}. If the end goal is understanding the
downstream consequences of all operations, then we have to connect up the dots
for all operations. That's why I originally had all calls to write() and
read() create trace spans.
I'm inclined to lean more towards goal #1 (figure out why specific requests had
high latency) than goal #2. I think that looking at the high-latency outliers
will naturally lead us to fix the biggest performance issues (such as locking
contention, disk issues, network issues, etc.). Also, if all calls to write()
and read() create trace spans, then this will have a "multiplicative" effect on
our top-level sampling rate which I think is undesirable.
bq. That being said, I hope I'm not being too much of a bother with all this. I
was just really excited to see this functionality in HDFS and want to make
we're getting good data coming back out. Thanks for bearing with me and for the
patches you've already made!
We definitely appreciate all the input. I think it's very helpful. I do think
maybe we should target 2.7.1 for some of these changes since I need to think
through everything. I know that's frustrating, but hopefully if we maintain a
reasonable Hadoop release cadence it won't be too bad. I'd also like to run
some patches by you guys to see if it improves the usefulness of HTrace to you.
And I am doing a bunch of testing internally which I think will turn up a lot
more potential improvements to HTrace and to its integration into HDFS.
Use-cases really should be very helpful in motivating us here.
> Reduce the number of HTrace spans generated by HDFS reads
> ---------------------------------------------------------
>
> Key: HDFS-8088
> URL: https://issues.apache.org/jira/browse/HDFS-8088
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HDFS-8088.001.patch
>
>
> HDFS generates too many trace spans on read right now. Every call to read()
> we make generates its own span, which is not very practical for things like
> HBase or Accumulo that do many such reads as part of a single operation.
> Instead of tracing every call to read(), we should only trace the cases where
> we refill the buffer inside a BlockReader.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)