[ 
https://issues.apache.org/jira/browse/HDFS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493363#comment-14493363
 ] 

Josh Elser commented on HDFS-8088:
----------------------------------

Ran a write-heavy test with the patch here as well as the HDFS-8026 patch on 
top of 2.7.1-SNAPSHOT, and found one last span "hotspot" (wonderfully 
formatted, courtesy of [~billie.rinaldi])

{noformat}
# Total spans from HDFS
{type='HDFS', nonzeroCount=6941, zeroCount=77221, numTraces=338, 
log10SpanLength=[77221, 5336, 1594, 11, 0, 0, 0]}, total 84162

# Offender
DFSOutputStream#write={type='HDFS', nonzeroCount=4252, zeroCount=75000, 
numTraces=24, log10SpanLength=[75000, 3598, 654, 0, 0, 0, 0]}
{noformat}

Giving a very quick look at the code (and making what's possible a bad guess), 
perhaps all of the 0ms length spans (denoted by zeroCount in the above, as 
opposed to the nonzeroCount) are when {{DFSOutputStream#writeChunk}} is only 
appending data into the current packet and not actually submitting that packet 
for the data streamer to process? With some more investigation into the 
hierarchy, I bet I could definitively determine that.

That being said, I hope I'm not being too much of a bother with all this. I was 
just really excited to see this functionality in HDFS and want to make we're 
getting good data coming back out. Thanks for bearing with me and for the 
patches you've already made!

> Reduce the number of HTrace spans generated by HDFS reads
> ---------------------------------------------------------
>
>                 Key: HDFS-8088
>                 URL: https://issues.apache.org/jira/browse/HDFS-8088
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-8088.001.patch
>
>
> HDFS generates too many trace spans on read right now.  Every call to read() 
> we make generates its own span, which is not very practical for things like 
> HBase or Accumulo that do many such reads as part of a single operation.  
> Instead of tracing every call to read(), we should only trace the cases where 
> we refill the buffer inside a BlockReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to