[jira] [Commented] (HTRACE-164) htrace hrpc: use msgpack for serialization

Colin Patrick McCabe (JIRA) Tue, 05 May 2015 19:57:41 -0700

    [ 
https://issues.apache.org/jira/browse/HTRACE-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529797#comment-14529797
 ]


Colin Patrick McCabe commented on HTRACE-164:
---------------------------------------------

bq. I can't use the library either if their rpc is tied together.

Elliot, there is nothing about this patch that impacts you at all.  You don't 
use htraced, and this patch only affects htraced.  This patch doesn't add any 
dependencies to the {{libhtrace.so}} library... in fact, it reduces the number 
of dependencies.  If this patch impacts you at all, which is very doubtful, it 
would be positively by removing dependencies from the C client that might cause 
conflicts.  This patch doesn't affect {{htrace-core}}, only {{htrace-htraced}} 
and {{htrace-c}}.

bq. I think static linking would be better (that pushes the work onto the 
htrace devs rather than the users).

Actually, it's the exact opposite.  Static linking pushes the work on to the 
users.  As I indicated earlier, most Linux distros don't provide static 
versions of anything.  And we can't bundle most dependencies because of license 
incompatibilities (GPL vs Apache, etc.).  For example, libcurl depends on 
libidn, which is LGPL.  We can't bundle that, and I don't even think we can 
make it a mandatory part of our build process without raising issues.  So we 
would be basically throwing up our hands and saying "sorry, users, you'll need 
to figure out a way to build this mess statically.  Good luck."  A big part of 
the motivation for this patch is getting rid of those dependencies and 
simplifying the build.

bq. However I still think it's really bad to include another serialization 
format and to write our own rpc. There's just nothing that's complex enough of 
performance sensitive enough about this to justify the added complexity.

Sending spans over the wire is performance sensitive.  One of the goals of 
htrace is to be usable in production.  Using a verbose format like JSON uses up 
both network and CPU bandwidth and is not a good idea if the goal is to be 
usable in production.

bq. The hadoop world has settled on protobuf. Unless something is unworkable 
with protobuf then I don't see using anything else.

We are not Hadoop, and we don't need to use the same things Hadoop uses.  And 
anyway, as I explained earlier, using protobuf actually makes things more 
complex for people using Hadoop, not less, because of the library version 
conflicts.  In Java-land, we can solve a lot of that with shading, but in C 
it's much harder to do, as I explained above.  These issues are more than 
theoretical to me since I suffered through the protobuf 2.4.1 -> 2.5 transition 
in Hadoop.  I still can't build old versions of Hadoop without using a VM and I 
am bitter about that.

bq. Most hadoop/hbase ( where htrace will be used most often for now) installs 
right now will have: Avro, Protobuf, Thrift, Json.  That list is laughably bad. 
Adding another that's less supported than the above ones just seems like nih. 
Then we're going to have to write and maintain our own rpc stack. Something 
that provides almost no value to the end user.

Again, HTrace is not Hadoop.  HTrace has never used Avro or Thrift.  The only 
serialization format we've ever had in htrace-core is JSON.  htrace-hbase uses 
protobuf, but htraced has never used it.  The subprojects of HTrace have 
separate dependencies.  This is by design so that different span receivers can 
do what's appropriate for them.  One size does not fit all.  The subprojects 
also carefully shade their dependencies so that it doesn't impact the client 
CLASSPATH.  There is absolutely no impact on users from this change.

Using something just because Hadoop uses it is practically the definition of 
NIH.  Nobody is asking you to learn msgpack or even to care about the fact that 
it exists.  Implementing it in htraced was practically a one-line change.  

I could change it back to json serialization just by doing: 
{code}
- mh := new(codec.MsgpackHandle)
+ mh := new(codec.Json)
{code}

in two or three places in the code.  I've spent way more bytes and time in this 
discussion than I ever spent implementing or maintaining any of this.

We are always going to support accessing htraced via HTTP/JSON because that is 
what the Javascript UI uses.  You will never be unable to access htraced or 
write an htraced client because you don't know msgpack.

If you think msgpack is a plague upon mankind and you want to write your own 
JSON client for htraced-- great.  We will always support the JSON API over 
plain old HTTP.  Nobody is asking you to learn or maintain anything.  If you 
want to add support for other span receivers besides htraced and localFile in 
the libhtrace.so C client-- awesome.  We would love to support more span 
receivers there.  Let's be constructive here.

> htrace hrpc: use msgpack for serialization
> ------------------------------------------
>
>                 Key: HTRACE-164
>                 URL: https://issues.apache.org/jira/browse/HTRACE-164
>             Project: HTrace
>          Issue Type: Bug
>    Affects Versions: 3.2.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HTRACE-164.002.patch
>
>
> htrace HRPC should use msgpack for serialization.  Messages serialized using 
> msgpack use less space on the wire and use less CPU time to encode.  The CMP 
> library allows us to include msgpack support easily in the htrace C client.  
> There is also good Java and Golang support available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HTRACE-164) htrace hrpc: use msgpack for serialization

Reply via email to