[
https://issues.apache.org/jira/browse/HTRACE-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529797#comment-14529797
]
Colin Patrick McCabe commented on HTRACE-164:
---------------------------------------------
bq. I can't use the library either if their rpc is tied together.
Elliot, there is nothing about this patch that impacts you at all. You don't
use htraced, and this patch only affects htraced. This patch doesn't add any
dependencies to the {{libhtrace.so}} library... in fact, it reduces the number
of dependencies. If this patch impacts you at all, which is very doubtful, it
would be positively by removing dependencies from the C client that might cause
conflicts. This patch doesn't affect {{htrace-core}}, only {{htrace-htraced}}
and {{htrace-c}}.
bq. I think static linking would be better (that pushes the work onto the
htrace devs rather than the users).
Actually, it's the exact opposite. Static linking pushes the work on to the
users. As I indicated earlier, most Linux distros don't provide static
versions of anything. And we can't bundle most dependencies because of license
incompatibilities (GPL vs Apache, etc.). For example, libcurl depends on
libidn, which is LGPL. We can't bundle that, and I don't even think we can
make it a mandatory part of our build process without raising issues. So we
would be basically throwing up our hands and saying "sorry, users, you'll need
to figure out a way to build this mess statically. Good luck." A big part of
the motivation for this patch is getting rid of those dependencies and
simplifying the build.
bq. However I still think it's really bad to include another serialization
format and to write our own rpc. There's just nothing that's complex enough of
performance sensitive enough about this to justify the added complexity.
Sending spans over the wire is performance sensitive. One of the goals of
htrace is to be usable in production. Using a verbose format like JSON uses up
both network and CPU bandwidth and is not a good idea if the goal is to be
usable in production.
bq. The hadoop world has settled on protobuf. Unless something is unworkable
with protobuf then I don't see using anything else.
We are not Hadoop, and we don't need to use the same things Hadoop uses. And
anyway, as I explained earlier, using protobuf actually makes things more
complex for people using Hadoop, not less, because of the library version
conflicts. In Java-land, we can solve a lot of that with shading, but in C
it's much harder to do, as I explained above. These issues are more than
theoretical to me since I suffered through the protobuf 2.4.1 -> 2.5 transition
in Hadoop. I still can't build old versions of Hadoop without using a VM and I
am bitter about that.
bq. Most hadoop/hbase ( where htrace will be used most often for now) installs
right now will have: Avro, Protobuf, Thrift, Json. That list is laughably bad.
Adding another that's less supported than the above ones just seems like nih.
Then we're going to have to write and maintain our own rpc stack. Something
that provides almost no value to the end user.
Again, HTrace is not Hadoop. HTrace has never used Avro or Thrift. The only
serialization format we've ever had in htrace-core is JSON. htrace-hbase uses
protobuf, but htraced has never used it. The subprojects of HTrace have
separate dependencies. This is by design so that different span receivers can
do what's appropriate for them. One size does not fit all. The subprojects
also carefully shade their dependencies so that it doesn't impact the client
CLASSPATH. There is absolutely no impact on users from this change.
Using something just because Hadoop uses it is practically the definition of
NIH. Nobody is asking you to learn msgpack or even to care about the fact that
it exists. Implementing it in htraced was practically a one-line change.
I could change it back to json serialization just by doing:
{code}
- mh := new(codec.MsgpackHandle)
+ mh := new(codec.Json)
{code}
in two or three places in the code. I've spent way more bytes and time in this
discussion than I ever spent implementing or maintaining any of this.
We are always going to support accessing htraced via HTTP/JSON because that is
what the Javascript UI uses. You will never be unable to access htraced or
write an htraced client because you don't know msgpack.
If you think msgpack is a plague upon mankind and you want to write your own
JSON client for htraced-- great. We will always support the JSON API over
plain old HTTP. Nobody is asking you to learn or maintain anything. If you
want to add support for other span receivers besides htraced and localFile in
the libhtrace.so C client-- awesome. We would love to support more span
receivers there. Let's be constructive here.
> htrace hrpc: use msgpack for serialization
> ------------------------------------------
>
> Key: HTRACE-164
> URL: https://issues.apache.org/jira/browse/HTRACE-164
> Project: HTrace
> Issue Type: Bug
> Affects Versions: 3.2.0
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HTRACE-164.002.patch
>
>
> htrace HRPC should use msgpack for serialization. Messages serialized using
> msgpack use less space on the wire and use less CPU time to encode. The CMP
> library allows us to include msgpack support easily in the htrace C client.
> There is also good Java and Golang support available.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)