himanshug commented on issue #7919: disable all compression in intermediate 
segment persists while ingestion
URL: https://github.com/apache/incubator-druid/pull/7919#issuecomment-503387212
 
 
   @clintropolis all the PRs you linked are independent improvements on 
indexing, index merging, compression, freeing buffers as soon as possible etc 
which are great and will happen at some point.
   This PR is an immediate solution to address the problem. So, I am glad that 
you agree that having different IndexSpec for intermediate persisted segments 
makes sense.
   
   Now the real question that affects this PR is whether to change default 
IndexSpec for intermediate persisted segments or not. 
   
   From code perspective it is fairy trivial to retain existing default 
behavior and I can make that change if that is what most people desire, however 
here is my rationale for changing the default.
   
   My observation on different clusters have been that current indexing task 
process peak memory usage is much higher compared to average utilization during 
ingestion due to merge process at the time of publish. Due to that, users plan 
for task process to have the "peak" memory available to it throughout its 
lifetime.
   When compression is disabled on intermediate segments, then average memory 
utilization would increase (more page cache used) but overall peak memory usage 
would decrease due to no decompression buffers allocated at time of merge. 
Also, queries would run faster because data is stored uncompressed.
   All said, you are right as above assumptions would not hold on some clusters 
due to specifics of the datasets that these configurations depend upon , there 
is no single choice that is good for everyone (or else we wouldn't have those 
configs :) )
   If we keep existing behavior as default then I'm afraid there would be very 
few cluster operators who will use the config introduced here to disable 
compression on intermediate segments. OTOH , with changed default behavior it 
would be improve things in most cases and where not, they can use the config to 
get back older behavior.
   Or maybe I am totally wrong and problems would show up on different test 
clusters upgrading to RCs and we will re-instate the old behavior as default in 
a follow-up PR.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to