gianm commented on pull request #11201: URL: https://github.com/apache/druid/pull/11201#issuecomment-836992606
> This is not correct, at least for the HLL in datasketches-java (I'm not sure what the Druid adaptor does). Strings are encoded using UTF-8 and have been for as long as I can remember. If you wish to use UTF-16, you just convert your string to char[] and the HLL sketch will accept that as well. @leerho Understood, but it is true as far as Druid is concerned — the HllSketch-based aggregator implementation in Druid does `update(s.toCharArray())` not `update(s)`: https://github.com/apache/druid/blob/8296123d895db7d06bc4517db5e767afb7862b83/extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/HllSketchBuildAggregator.java#L103 > Nonetheless, whatever you decide, you will always need to stick with your choice. Yep, that's why this must be an option and the choice needs to be made in a consistent way. > I have some comments about PR 353 but I want to make these in the actual PR. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
