[GitHub] [druid] clintropolis commented on pull request #13370: Unsigned integer druid complex column

GitBox Thu, 08 Dec 2022 14:52:32 -0800


clintropolis commented on PR #13370:
URL: https://github.com/apache/druid/pull/13370#issuecomment-1343475115

oops, sorry for the delay

>@clintropolis i looked at the long encoding work some more and think it
handles everything for us. If we take a look at VSizeLongSerde.getSerializer()
it looks like it only stores the bits it needs.

yes, it does bit-packing so should effectively achieve the same thing

>I think this PR is not necessary with this feature, but I do have a
suggestion. I believe this patch has been in for a while and we should make
long encoding on by default. When I tested the encoding vs unsigned int, it
encoded better because it rarely stored leading 0's. What do you think about me
closing this PR and having another one to have long encoding on by default in
druid

I think it would be reasonable to turn on by default. I used to have some
worries about the performance since the abstraction seems to cause some
overhead, especially in the non-vectorized engine, at least the last time I
measured this
https://user-images.githubusercontent.com/1577461/42849379-d1483132-89d7-11e8-8cdd-2382690d70b6.gif
as seen by this chart where the 'auto' encoded data grew at a faster rate than
the 'longs' encoding (top chart is basically segment scan time from 0 to 100%
selection) that i collected as part of ancient #6016 (which maybe someday I
will get back to...).

But, the difference wasn't huge, and I think the vectorization improvements
done as part of #11004 probably make up for this, so Im ok with switching the
default.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] clintropolis commented on pull request #13370: Unsigned integer druid complex column

Reply via email to