Folks,

Turns out we made a mistake back on June 30, 2017 on Hive release 0.10.2
<https://github.com/apache/incubator-datasketches-hive/releases/tag/sketches-hive-0.10.2>,
long before we moved to ASF.   Having Strings encode as char[] instead of
UTF-8 has created a cross-system incompatibility between sketches created
in Hive fed with strings vs any other system that feeds the sketch with
UTF-8 encoded strings.  Oops!

A Hive issue
<https://github.com/apache/incubator-datasketches-hive/issues/54> was
created 18 days ago about this problem.

I don’t see a good solution here.  We could either try to document the hell
out of it warning users of this, or, perhaps we could mark the Hive
SketchState update(...) method deprecated and create an alternate method
“newUpdate(...)” that uses UTF-8 for strings.

Comments and brilliant suggestions welcome!  😀

Lee.

Reply via email to