[ 
https://issues.apache.org/jira/browse/HDFS-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484542#comment-14484542
 ] 

Colin Patrick McCabe commented on HDFS-8069:
--------------------------------------------

bq. Josh wrote: As Billie said, we're not tracing the tracing code .

Thanks for confirming this.  Just to double-check, can you confirm that you 
have {{hadoop.htrace.sampler}} set to nothing (the default).

bq. Josh wrote: \[a second cluster is\] A non-starter for me. We've had 
distributed tracing support built into Accumulo for years without issue. To 
suddenly inform users that they need to spin up a second cluster is a no-go.

Understood.  I think that the configuration you outlined, where 
{{hadoop.htrace.sampler}} is set to NeverSampler (or left unset) and all 
sampling happens at the level of Accumulo, should work.  We just need to fix 
the issues that we have currently.

bq. Billie wrote: I think this might be the case \[that HDFS tracing is too 
chatty\]. Creating spans for byte array reads of one byte or more effectively 
makes us unable to trace client operations if they happen to use 
DFSInputStream, which we are using to read walogs. Operations involving 
Accumulo's RFiles seem to be in better shape since we are reading blocks from 
them.

I am going to open an issue in HDFS to only trace the cases where we actually 
fill the buffer of the HDFS BlockReader.  I think that it's a reasonable 
tradeoff to make, given that filling the HDFS BlockReader buffer tends to be 
the main thing that delays readers from HDFS.  Just reading a byte from the 
in-memory buffer that already exists very seldom causes any delay, if ever.

bq. Billie wrote: We are only tracing one Accumulo operation, but it is a 
fairly complex operation. So even if we traced this operation less often, we 
would still run into this issue

If the Accumlo operation is big enough, it may be necessary to split it into 
multiple HTrace spans.  For example, I think tracing an entire compaction would 
be too big.  We may have to experiment with this somewhat.

> Tracing implementation on DFSInputStream seriously degrades performance
> -----------------------------------------------------------------------
>
>                 Key: HDFS-8069
>                 URL: https://issues.apache.org/jira/browse/HDFS-8069
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 2.7.0
>            Reporter: Josh Elser
>            Priority: Critical
>
> I've been doing some testing of Accumulo with HDFS 2.7.0 and have noticed a 
> serious performance impact when Accumulo registers itself as a SpanReceiver.
> The context of the test which I noticed the impact is that an Accumulo 
> process reads a series of updates from a write-ahead log. This is just 
> reading a series of Writable objects from a file in HDFS. With tracing 
> enabled, I waited for at least 10 minutes and the server still hadn't read a 
> ~300MB file.
> Doing a poor-man's inspection via repeated thread dumps, I always see 
> something like the following:
> {noformat}
> "replication task 2" daemon prio=10 tid=0x0000000002842800 nid=0x794d 
> runnable [0x00007f6c7b1ec000]
>    java.lang.Thread.State: RUNNABLE
>         at 
> java.util.concurrent.CopyOnWriteArrayList.iterator(CopyOnWriteArrayList.java:959)
>         at org.apache.htrace.Tracer.deliver(Tracer.java:80)
>         at org.apache.htrace.impl.MilliSpan.stop(MilliSpan.java:177)
>         - locked <0x000000077a770730> (a org.apache.htrace.impl.MilliSpan)
>         at org.apache.htrace.TraceScope.close(TraceScope.java:78)
>         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:898)
>         - locked <0x000000079fa39a48> (a 
> org.apache.hadoop.hdfs.DFSInputStream)
>         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:697)
>         - locked <0x000000079fa39a48> (a 
> org.apache.hadoop.hdfs.DFSInputStream)
>         at java.io.DataInputStream.readByte(DataInputStream.java:265)
>         at 
> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
>         at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
>         at 
> org.apache.accumulo.core.data.Mutation.readFields(Mutation.java:951)
>        ... more accumulo code omitted...
> {noformat}
> What I'm seeing here is that reading a single byte (in 
> WritableUtils.readVLong) is causing a new Span creation and close (which 
> includes a flush to the SpanReceiver). This results in an extreme amount of 
> spans for {{DFSInputStream.byteArrayRead}} just for reading a file from HDFS 
> -- over 700k spans for just reading a few hundred MB file.
> Perhaps there's something different we need to do for the SpanReceiver in 
> Accumulo? I'm not entirely sure, but this was rather unexpected.
> cc/ [~cmccabe]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to