[ 
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16704118#comment-16704118
 ] 

Colin P. McCabe commented on HADOOP-15566:
------------------------------------------

Hi folks,

I just saw this JIRA while searching for something else.  I was one of the guys 
who worked on HTrace, both on the Hadoop integration side and on the HTrace 
project itself.  It is definitely sad that it didn't make it out of the 
incubator.  There is clearly a need for this kind of work in Hadoop and in 
other projects.

I don't have a strong opinion about which other tracing API should be used in 
Hadoop.  I would caution everyone that Hadoop's compatibility shackles are 
heavy -- very heavy indeed.  Just to give an example, a typical Hadoop 
installation might have HDFS, HBase, and Phoenix installed.  These projects all 
have separate developers, PMCs, and release cycles, but expect to be able to 
share the same CLASSPATH happily.  Projects often push back very hard on trying 
to update library dependencies, especially in "minor" releases.  To add to 
that, people often stay on older stable versions of Hadoop for years.

In theory, Hadoop vendors offer a snaphot of the full Hadoop stack, carefully 
configured so that things work together.  In practice, libraries are not always 
harmonized as well as we would like.  Some users want to mix and match versions 
of things, or not even use a vendor distribution at all.  This makes setting up 
end-to-end tracing pretty difficult.

There were some efforts to add better CLASSPATH isolation to Hadoop.  I haven't 
kept up with those, so I don't know how much this situation has improved.

I do think that the idea of keeping HTrace around as a shim API might make 
sense for Hadoop.  This would mean that adding support for a new version of the 
OpenTracing or Zipkin library would only require updating that shim code in 
hadoop-common, rather than trying to coordinate changes across a dozen Hadoop 
projects.

Also, HTrace already has code to export spans to Zipkin, if that helps.  I 
think it would be relatively straightforward to write the same thing for 
opentracing as well.

> Remove HTrace support
> ---------------------
>
>                 Key: HADOOP-15566
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15566
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: metrics
>    Affects Versions: 3.1.0
>            Reporter: Todd Lipcon
>            Priority: Major
>              Labels: security
>         Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, 
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making 
> further releases. The Hadoop project currently has various hooks with HTrace. 
> It seems in some cases (eg HDFS-13702) these hooks have had measurable 
> performance overhead. Given these two factors, I think we should consider 
> removing the HTrace integration. If there is someone willing to do the work, 
> replacing it with OpenTracing might be a better choice since there is an 
> active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to