Zoltan, This is super! We look forward to any PRs that you feel would make sense to have in the library that would make this integration easier or more complete.
You might also want to look at this suggestion that came to us recently: > Support for "advanced" SQL types (in HLL) (in Hive) > <https://lists.apache.org/thread.html/r8660d693f17752d3ccfa09dc6a76a862f179e1868a33f9918b16c00f%40%3Cusers.datasketches.apache.org%3E> > . You are in a better position to evaluate these issues than we are :) Also, at some point it may make more sense to move the code we have in our datasketches-hive repository into Hive (we would then deprecate our Hive repo), where it can be more easily kept up-to-date as Hive evolves. This is how it works with our Druid integration. Having the DataSketches library tightly integrated with Hive will provide significantly improved performance and requires much more intimate knowledge of the internals of Hive than we have in our DataSketches team. Please stay in touch with us! Cheers, Lee. On Tue, Jul 7, 2020 at 3:41 AM Zoltan Haindrich <[email protected]> wrote: > Hey All! > > In the recent months I was working with Jesus Camacho Rodriguez on > integrating DataSketches more tightly with Hive [1]. > > So..from Hive 4.0 : almost all the datasketches functions will be > available in by default; to do this - I had to come up with some naming > convention/etc > (ds_{sketchType}_{functionName}) to register all ds functions. > I will contribute back some of these changes; but I was able to avoid > changing even datasketches-hive so far - I've noticed that there are some > "simple" functions which are > missing; and they should be there - just for completeness reasons (iirc > mostly toString function and probably a few more). > > Probably the most interesting for you is that by utilizing Calcite a set > of rules can transparently rewrite > COUNT(DISTINCT)/PERCENTILE_DISC/CUME_DIST/RANK/NTILE to use > sketch functions! :) > Materialized views are also supported - so that sketches can be stored > precomputed(and rolled up). > > If you would like to get a quick look what it does; the test for rewriting > rank [2] shows a few statements. > > Thank you for this great library! > > cheers, > Zoltan > > [1] https://issues.apache.org/jira/browse/HIVE-22939 > [2] > https://github.com/apache/hive/blob/e4256fc91fe2c123428400f3737883a83208d29e/ql/src/test/queries/clientpositive/sketches_rewrite_rank.q#L15 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
