[
https://issues.apache.org/jira/browse/DATASKETCHES-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118542#comment-17118542
]
Csaba Ringhofer commented on DATASKETCHES-5:
--------------------------------------------
Thanks for the quick fix!
> things should at least match for keys up to 32 bytes
I think it was only good till 16 bytes, as the tail will doubly offset if there
was at least one 16 byte block.
The issue was discovered by noticing that long strings had suspiciously high
estimates (as it turned out, the over-read led to picking up some randomness).
> Buffer over-read in MurmurHash3_x64_128
> ---------------------------------------
>
> Key: DATASKETCHES-5
> URL: https://issues.apache.org/jira/browse/DATASKETCHES-5
> Project: Apache Datasketches
> Issue Type: Bug
> Reporter: Csaba Ringhofer
> Assignee: Jon Malkin
> Priority: Critical
>
> MurmurHash3_x64_128 seems to contain a half-commented-out change that leads
> to adding the offset to the key 2 times:
> 'blocks ' is increased:
> https://github.com/apache/incubator-datasketches-cpp/blob/2941841dda921026a5dc2052388461d9295dc0b0/common/include/MurmurHash3.h#L128
> but the following lines assume that 'blocks ' still points to the start of
> the key:
> https://github.com/apache/incubator-datasketches-cpp/blob/2941841dda921026a5dc2052388461d9295dc0b0/common/include/MurmurHash3.h#L115
> https://github.com/apache/incubator-datasketches-cpp/blob/2941841dda921026a5dc2052388461d9295dc0b0/common/include/MurmurHash3.h#L133
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]