[ 
https://issues.apache.org/jira/browse/HTRACE-209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647091#comment-14647091
 ] 

Colin Patrick McCabe commented on HTRACE-209:
---------------------------------------------

Thanks, [~stack].

bq. One of the lads who ran original zipkin versions ("It was a weekend hack 
project") talked of the incidence of collisions at twitter scale being on the 
order of every so hours though they were throwing away the data every few days. 
The thought was that if you kept data longer, the likelihood of collision would 
only rise

Yeah.  It's an example of the Birthday Problem: 
https://en.wikipedia.org/wiki/Birthday_problem
Collisions are a lot more frequent than it intuitively seems like they should 
be.

bq. I like the bit where top 64bits are carried through but this is just 
convention. Anything we can do to formalize this? Spec?

I will add some documentation to the {{SpanId}} class.

bq. On if it walks like a UUID, then... it should be a UUID, why not type 4? 
122 bits are enough?

The problem is this would break our Hadoop / HBase wire compatibility, since 
right now we have Hadoop RPC passing through 128 bits.  I also think giving up 
bits is kind of questionable in general.

Re: string formats.  Let's talk about string formats in a follow-on JIRA...  
it's actually pretty quick to change once the rest of the patch is in.

bq. Seems odd that you'd set all parents on a span? Why not set parent on 
creation and then add parents.... but being able to set them all seems wrong.

Good question.  This is actually the existing API, except that I replaced 
setting the long[] with setting a SpanId[].  So this is no different than the 
current code (just changed the type).

The thinking behind it is that it could get expensive to do "validation" each 
time the parents of a span get set.  For example, if the span has 10 parents, 
and you add a new parent, are you going to iterate through every existing 
parent to ensure that the new parent doesn't duplicate an existing one?  Then 
you have to either sort or create a hash map or something to do that.  Even if 
you could do that efficiently, you'd have to allocate more objects (creating 
more garbage) since you couldn't use the passed-in array or list, but you would 
have to create your own.  So the API just simply allows the client code to set 
parents.

Perhaps we could create a friendlier API, but let's discuss it in a follow-on 
rather than here (this change is big enough :)

bq. SpanId needs a class comment on what it is about.

OK

bq. TraceTree should probably go away or be renamed as TraceGraph

Good point.  Let's rename this to TraceGraph.  I think this is only used by 
tests, as well, so let's just move it into the test directory.  It's not really 
an API we want to be committed to as a public API.

bq. Otherwise, reviewed c and java changes and skimmed go set... looks good to 
me.

great, will post another version which I think is ready to commit

> Make span ID 128 bit to avoid collisions
> ----------------------------------------
>
>                 Key: HTRACE-209
>                 URL: https://issues.apache.org/jira/browse/HTRACE-209
>             Project: HTrace
>          Issue Type: Sub-task
>    Affects Versions: 4.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HTRACE-209.001.patch, HTRACE-209.002.patch, 
> HTRACE-209.004.patch
>
>
> Make span ID 128 bit to avoid collisions in htrace 4.x



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to