[jira] [Comment Edited] (HTRACE-164) htrace hrpc: use msgpack for serialization

Colin Patrick McCabe (JIRA) Wed, 06 May 2015 12:58:12 -0700

    [ 
https://issues.apache.org/jira/browse/HTRACE-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14530112#comment-14530112
 ]


Colin Patrick McCabe edited comment on HTRACE-164 at 5/6/15 7:50 PM:
---------------------------------------------------------------------

bq. No it's not but lets be honest about what the first 200 installs of this 
software are going to be. The first uses will all be in the 
hadoop/hbase/phoenix world. If you make life hard for the first users, none are 
going to be compelled to use this other places.

I don't think that using msgpack makes life hard.  Users don't need to know 
that we are using msgpack under the hood.  We have complete CLASSPATH isolation 
on the Java client, and complete dependency pinning on the htraced server.

bq. I can't see any benefit to msgpack:

It makes the C library build simpler, a really nice goal.  It maps closely to 
JSON, much more closely than something like protobuf.  It doesn't require a 
separate definition file which we'd have to keep in sync with the span code.  
It allows incremental serialization which reduces the number of copies we need 
to make.

bq. A hand coded serialization format on the client side. 

It's not hard-coded-- it's using the CMP library.  And if we used something 
like protobuf, we would have "hard coded" code to copy over the data from the 
{{htrace_span}} class into the protobuf-generated class.  It's the same amount 
of code either way and more efficient this way.
 
bq. Hard coded limit on span data size.

Disagree.  There is no hard-coded limit-- unless maybe it's something absurd 
like 2 GB.  Or maybe you're thinking of the 16MB RPC size limit.  A 16 MB limit 
on span size is not a problem.

bq. I'm suggesting we use something that's used in large environments with lots 
of users and lots of developers. The htrace project doesn't have to create a 
new rpc format. We have no requirements that couldn't be answered by an already 
existing stack. I'm literally saying: Don't invent something here just for 
htrace.

Fortran is used in large environments with lots of users and lots of 
developers.  Your main argument seems to literally be that we should use 
something because it's popular.  This is a flawed argument.

bq. I think that adding msgpack doesn't gain enough to make the added 
complexity on the client and the server worth it. gzipped json is pretty fast. 
If we need faster and need to have an rpc then using an already existing rpc 
stack makes more sense.

The extra complexity on the server is literally 5 or 6 lines.  Maybe a little 
extra because I switched to using the more efficient go/codec library which has 
a slightly different interface.  The complexity on the C client would be there 
anyway because a lot of it is centered around size-and-time-based flushing of 
buffers.

bq. another dep on the server that's not one of the ones already in use by the 
systems using htrace.

Again, htraced dependencies are not exposed to users.  We use godep to select 
the right versions of each dependency and do effectively a static build.  This 
is the normal way builds are done in golang-- a single binary is created that 
contains all the code.

bq. I don't think that htrace is even close to mature enough to be encouraging 
fracturing the community. There shouldn't be one format for java and one for 
some C.

I am planning on making the Java client use the more efficient serialization 
format too.  If I had more time it would already be done.

bq. I've suggested multiple alternatives with technical reasoning around why 
they would be better. I'm -1 on this patch for the technical reasons above.

I am concerned that we are making this decision for political reasons (i.e. X 
is more popular) rather than technical.  I don't buy into the idea that the  
burden of maintaining our own RPC code would be so unbearable that we 
absolutely can't do it.  I implemented the existing RPC it in a weekend so I 
know that it's not that complex.

[edit: rephrasing]


was (Author: cmccabe):
bq. No it's not but lets be honest about what the first 200 installs of this 
software are going to be. The first uses will all be in the 
hadoop/hbase/phoenix world. If you make life hard for the first users, none are 
going to be compelled to use this other places.

How does using msgpack "make life hard"?  If I had titled this JIRA something 
else, you would never have even known that msgpack was being used under the 
hood.  We have complete CLASSPATH isolation on the Java client, and complete 
dependency pinning on the htraced server.  I know because I implemented it.

bq. I can't see any benefit to msgpack:

I already explained many times what the benefits are.  It makes the C library 
build simpler, a really nice goal.  It maps closely to JSON, much more closely 
than something like protobuf.  It doesn't require a separate definition file 
which we'd have to keep in sync with the span code.  It allows incremental 
serialization which reduces the number of copies we need to make.

bq. A hand coded serialization format on the client side. 

It's not hard-coded-- it's using the CMP library.  And if we used something 
like protobuf, we would have "hard coded" code to copy over the data from the 
{{htrace_span}} class into the protobuf-generated class.  It's the same amount 
of code either way and more efficient this way.
 
bq. Hard coded limit on span data size.

What would possibly make you think this?  There is no hard-coded limit-- unless 
maybe it's something absurd like 2 GB.  Or maybe you're thinking of the 16MB 
RPC size limit.  You are honestly claiming that a 16 MB limit on span size is a 
problem?

bq. I'm suggesting we use something that's used in large environments with lots 
of users and lots of developers. The htrace project doesn't have to create a 
new rpc format. We have no requirements that couldn't be answered by an already 
existing stack. I'm literally saying: Don't invent something here just for 
htrace.

Fortran is used in large environments with lots of users and lots of 
developers.  Your main argument seems to literally be that we should use 
something because it's popular.  Can you not see any potential flaws in that 
argument?

bq. I think that adding msgpack doesn't gain enough to make the added 
complexity on the client and the server worth it. gzipped json is pretty fast. 
If we need faster and need to have an rpc then using an already existing rpc 
stack makes more sense.

Again, I explained several times that "the extra complexity on the server" is 
literally 5 or 6 lines.  Maybe a little extra because I switched to using the 
more efficient go/codec library which has a slightly different interface.  The 
complexity on the C client would be there anyway because a lot of it is 
centered around size-and-time-based flushing of buffers.

bq. another dep on the server that's not one of the ones already in use by the 
systems using htrace.

Again, htraced dependencies are not exposed to users.  We use godep to select 
the right versions of each dependency and do effectively a static build.  This 
is the normal way builds are done in golang-- a single binary is created that 
contains all the code.

bq. I don't think that htrace is even close to mature enough to be encouraging 
fracturing the community. There shouldn't be one format for java and one for 
some C.

I am planning on making the Java client use the more efficient serialization 
format too.  If I had more time it would already be done.

bq. I've suggested multiple alternatives with technical reasoning around why 
they would be better. I'm -1 on this patch for the technical reasons above.

"You should use GRPC.io or Thrift because they are more popular" is not a 
technical argument.  It is a political one.  You have alleged several times 
that the burden of maintaining our own RPC code would be so unbearable that we 
absolutely can't do it-- with no proof whatsoever.  But I implemented it in a 
weekend so I know that argument is false.  There are a lot of flaws with 
Thrift, ask the Impala guys.

> htrace hrpc: use msgpack for serialization
> ------------------------------------------
>
>                 Key: HTRACE-164
>                 URL: https://issues.apache.org/jira/browse/HTRACE-164
>             Project: HTrace
>          Issue Type: Bug
>    Affects Versions: 3.2.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HTRACE-164.002.patch
>
>
> htrace HRPC should use msgpack for serialization.  Messages serialized using 
> msgpack use less space on the wire and use less CPU time to encode.  The CMP 
> library allows us to include msgpack support easily in the htrace C client.  
> There is also good Java and Golang support available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HTRACE-164) htrace hrpc: use msgpack for serialization

Reply via email to