[
https://issues.apache.org/jira/browse/HTRACE-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14530112#comment-14530112
]
Colin Patrick McCabe edited comment on HTRACE-164 at 5/6/15 7:50 PM:
---------------------------------------------------------------------
bq. No it's not but lets be honest about what the first 200 installs of this
software are going to be. The first uses will all be in the
hadoop/hbase/phoenix world. If you make life hard for the first users, none are
going to be compelled to use this other places.
I don't think that using msgpack makes life hard. Users don't need to know
that we are using msgpack under the hood. We have complete CLASSPATH isolation
on the Java client, and complete dependency pinning on the htraced server.
bq. I can't see any benefit to msgpack:
It makes the C library build simpler, a really nice goal. It maps closely to
JSON, much more closely than something like protobuf. It doesn't require a
separate definition file which we'd have to keep in sync with the span code.
It allows incremental serialization which reduces the number of copies we need
to make.
bq. A hand coded serialization format on the client side.
It's not hard-coded-- it's using the CMP library. And if we used something
like protobuf, we would have "hard coded" code to copy over the data from the
{{htrace_span}} class into the protobuf-generated class. It's the same amount
of code either way and more efficient this way.
bq. Hard coded limit on span data size.
Disagree. There is no hard-coded limit-- unless maybe it's something absurd
like 2 GB. Or maybe you're thinking of the 16MB RPC size limit. A 16 MB limit
on span size is not a problem.
bq. I'm suggesting we use something that's used in large environments with lots
of users and lots of developers. The htrace project doesn't have to create a
new rpc format. We have no requirements that couldn't be answered by an already
existing stack. I'm literally saying: Don't invent something here just for
htrace.
Fortran is used in large environments with lots of users and lots of
developers. Your main argument seems to literally be that we should use
something because it's popular. This is a flawed argument.
bq. I think that adding msgpack doesn't gain enough to make the added
complexity on the client and the server worth it. gzipped json is pretty fast.
If we need faster and need to have an rpc then using an already existing rpc
stack makes more sense.
The extra complexity on the server is literally 5 or 6 lines. Maybe a little
extra because I switched to using the more efficient go/codec library which has
a slightly different interface. The complexity on the C client would be there
anyway because a lot of it is centered around size-and-time-based flushing of
buffers.
bq. another dep on the server that's not one of the ones already in use by the
systems using htrace.
Again, htraced dependencies are not exposed to users. We use godep to select
the right versions of each dependency and do effectively a static build. This
is the normal way builds are done in golang-- a single binary is created that
contains all the code.
bq. I don't think that htrace is even close to mature enough to be encouraging
fracturing the community. There shouldn't be one format for java and one for
some C.
I am planning on making the Java client use the more efficient serialization
format too. If I had more time it would already be done.
bq. I've suggested multiple alternatives with technical reasoning around why
they would be better. I'm -1 on this patch for the technical reasons above.
I am concerned that we are making this decision for political reasons (i.e. X
is more popular) rather than technical. I don't buy into the idea that the
burden of maintaining our own RPC code would be so unbearable that we
absolutely can't do it. I implemented the existing RPC it in a weekend so I
know that it's not that complex.
[edit: rephrasing]
was (Author: cmccabe):
bq. No it's not but lets be honest about what the first 200 installs of this
software are going to be. The first uses will all be in the
hadoop/hbase/phoenix world. If you make life hard for the first users, none are
going to be compelled to use this other places.
How does using msgpack "make life hard"? If I had titled this JIRA something
else, you would never have even known that msgpack was being used under the
hood. We have complete CLASSPATH isolation on the Java client, and complete
dependency pinning on the htraced server. I know because I implemented it.
bq. I can't see any benefit to msgpack:
I already explained many times what the benefits are. It makes the C library
build simpler, a really nice goal. It maps closely to JSON, much more closely
than something like protobuf. It doesn't require a separate definition file
which we'd have to keep in sync with the span code. It allows incremental
serialization which reduces the number of copies we need to make.
bq. A hand coded serialization format on the client side.
It's not hard-coded-- it's using the CMP library. And if we used something
like protobuf, we would have "hard coded" code to copy over the data from the
{{htrace_span}} class into the protobuf-generated class. It's the same amount
of code either way and more efficient this way.
bq. Hard coded limit on span data size.
What would possibly make you think this? There is no hard-coded limit-- unless
maybe it's something absurd like 2 GB. Or maybe you're thinking of the 16MB
RPC size limit. You are honestly claiming that a 16 MB limit on span size is a
problem?
bq. I'm suggesting we use something that's used in large environments with lots
of users and lots of developers. The htrace project doesn't have to create a
new rpc format. We have no requirements that couldn't be answered by an already
existing stack. I'm literally saying: Don't invent something here just for
htrace.
Fortran is used in large environments with lots of users and lots of
developers. Your main argument seems to literally be that we should use
something because it's popular. Can you not see any potential flaws in that
argument?
bq. I think that adding msgpack doesn't gain enough to make the added
complexity on the client and the server worth it. gzipped json is pretty fast.
If we need faster and need to have an rpc then using an already existing rpc
stack makes more sense.
Again, I explained several times that "the extra complexity on the server" is
literally 5 or 6 lines. Maybe a little extra because I switched to using the
more efficient go/codec library which has a slightly different interface. The
complexity on the C client would be there anyway because a lot of it is
centered around size-and-time-based flushing of buffers.
bq. another dep on the server that's not one of the ones already in use by the
systems using htrace.
Again, htraced dependencies are not exposed to users. We use godep to select
the right versions of each dependency and do effectively a static build. This
is the normal way builds are done in golang-- a single binary is created that
contains all the code.
bq. I don't think that htrace is even close to mature enough to be encouraging
fracturing the community. There shouldn't be one format for java and one for
some C.
I am planning on making the Java client use the more efficient serialization
format too. If I had more time it would already be done.
bq. I've suggested multiple alternatives with technical reasoning around why
they would be better. I'm -1 on this patch for the technical reasons above.
"You should use GRPC.io or Thrift because they are more popular" is not a
technical argument. It is a political one. You have alleged several times
that the burden of maintaining our own RPC code would be so unbearable that we
absolutely can't do it-- with no proof whatsoever. But I implemented it in a
weekend so I know that argument is false. There are a lot of flaws with
Thrift, ask the Impala guys.
> htrace hrpc: use msgpack for serialization
> ------------------------------------------
>
> Key: HTRACE-164
> URL: https://issues.apache.org/jira/browse/HTRACE-164
> Project: HTrace
> Issue Type: Bug
> Affects Versions: 3.2.0
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HTRACE-164.002.patch
>
>
> htrace HRPC should use msgpack for serialization. Messages serialized using
> msgpack use less space on the wire and use less CPU time to encode. The CMP
> library allows us to include msgpack support easily in the htrace C client.
> There is also good Java and Golang support available.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)