[GitHub] clintropolis commented on issue #6016: Druid 'Shapeshifting' Columns

GitBox Thu, 02 Aug 2018 01:38:11 -0700

clintropolis commented on issue #6016: Druid 'Shapeshifting' Columns
URL: https://github.com/apache/incubator-druid/pull/6016#issuecomment-409850996
 
 
   @leventov @himanshug I think I've got another viable, maybe even _better_, 
variant of this general idea that I can craft with relatively minor changes - 
that could eliminate the need for primitive arrays entirely and move everything 
back to off-heap direct buffers and even help simplify the code quite a bit. 
   
   My weekend fun hack project (which has also spilled into every night this 
week), was to write a JNI wrapper around [the native version of 
FastPfor](https://github.com/lemire/FastPFor), and then plug that and all of 
it's algorithms in as another encoder/decoder option to experiment with. This 
was an itch I've wanted to scratch since I started working with this stuff 
since I was curious how java compares to calling native code from java. I have 
a lot more testing and benchmarking to do, and the simd versions of codecs seem 
to be finicky about memory alignment, but it seems possible to achieve even 
better performance gains going native, based on my limited observations so far. 
This is using the same direct buffers from the compression pool as lz4 
bytepacking, so memory footprint if we go this way should be very similar to 
what it is now (plus whatever the native code is allocating). 
   
   The major downside is that the FastPFOR algorithm implementations do not 
seem compatible with each other so it could be painful to switch later on (at 
least the simd version and java version, haven't tried the non simd version 
with the java version yet, so maybe there is still hope). I suppose it is also 
possible that this is a bug in one of the libraries.
   
   There would be some consideration into how we would want to maintain this 
native mapping - I'm currently building all the native parts by hand and 
stuffing as resources in a standalone package which i can install with maven 
locally to test, but I'm a bit fuzzy on where to go from there and don't really 
know what the legit way to do this is (I was modeling the lz4 native library). 
   
   I might be getting ahead of myself, but if we were to pursue this approach, 
I would assume we want to maintain this as a package in druid, maybe something 
like `druid-native-processing`? I think we want a package _somewhere_ which 
could hold the native java sources, JNI headers and sources, maybe git 
submodules of 3rd party native libraries, and pre-built versions of those 
libraries in the resources of the package. I think it would probably be a pain 
to setup cross compilation to build the native libs that are packaged in the 
resources in a CI way, but I think it useful at least to be able to build them 
from within the package manually. There are some maven plugins dealing with 
building native stuff that I need to look further into if we get serious about 
this.
   
   I'm going to keep playing with this to see if I can get it operating 
smoothly. A refactor should be relatively painless and quick, I'll make a 
branch to sketch out what it might look like - if further testing is promising.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] clintropolis commented on issue #6016: Druid 'Shapeshifting' Columns

Reply via email to