[
https://issues.apache.org/jira/browse/HTRACE-209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647091#comment-14647091
]
Colin Patrick McCabe commented on HTRACE-209:
---------------------------------------------
Thanks, [~stack].
bq. One of the lads who ran original zipkin versions ("It was a weekend hack
project") talked of the incidence of collisions at twitter scale being on the
order of every so hours though they were throwing away the data every few days.
The thought was that if you kept data longer, the likelihood of collision would
only rise
Yeah. It's an example of the Birthday Problem:
https://en.wikipedia.org/wiki/Birthday_problem
Collisions are a lot more frequent than it intuitively seems like they should
be.
bq. I like the bit where top 64bits are carried through but this is just
convention. Anything we can do to formalize this? Spec?
I will add some documentation to the {{SpanId}} class.
bq. On if it walks like a UUID, then... it should be a UUID, why not type 4?
122 bits are enough?
The problem is this would break our Hadoop / HBase wire compatibility, since
right now we have Hadoop RPC passing through 128 bits. I also think giving up
bits is kind of questionable in general.
Re: string formats. Let's talk about string formats in a follow-on JIRA...
it's actually pretty quick to change once the rest of the patch is in.
bq. Seems odd that you'd set all parents on a span? Why not set parent on
creation and then add parents.... but being able to set them all seems wrong.
Good question. This is actually the existing API, except that I replaced
setting the long[] with setting a SpanId[]. So this is no different than the
current code (just changed the type).
The thinking behind it is that it could get expensive to do "validation" each
time the parents of a span get set. For example, if the span has 10 parents,
and you add a new parent, are you going to iterate through every existing
parent to ensure that the new parent doesn't duplicate an existing one? Then
you have to either sort or create a hash map or something to do that. Even if
you could do that efficiently, you'd have to allocate more objects (creating
more garbage) since you couldn't use the passed-in array or list, but you would
have to create your own. So the API just simply allows the client code to set
parents.
Perhaps we could create a friendlier API, but let's discuss it in a follow-on
rather than here (this change is big enough :)
bq. SpanId needs a class comment on what it is about.
OK
bq. TraceTree should probably go away or be renamed as TraceGraph
Good point. Let's rename this to TraceGraph. I think this is only used by
tests, as well, so let's just move it into the test directory. It's not really
an API we want to be committed to as a public API.
bq. Otherwise, reviewed c and java changes and skimmed go set... looks good to
me.
great, will post another version which I think is ready to commit
> Make span ID 128 bit to avoid collisions
> ----------------------------------------
>
> Key: HTRACE-209
> URL: https://issues.apache.org/jira/browse/HTRACE-209
> Project: HTrace
> Issue Type: Sub-task
> Affects Versions: 4.0
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HTRACE-209.001.patch, HTRACE-209.002.patch,
> HTRACE-209.004.patch
>
>
> Make span ID 128 bit to avoid collisions in htrace 4.x
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)