[GitHub] [druid] Abhyuday99 opened a new issue #9872: Parallelization across columns

GitBox Fri, 15 May 2020 03:35:28 -0700


Abhyuday99 opened a new issue #9872:
URL: https://github.com/apache/druid/issues/9872



   I am performing a sum query for different number of columns . For the same 
time-frame of a single day , I increase the number of columns in my query as 
powers to two and get the below results . **The timings attained are after 
setting useCache and populateCache as false in the query parameters**
   All timings 
   
   1 column - 71.37ms
   2 column -  72.05ms
   4 columns - 73.98ms
   8 columns - 79.85ms
   16 columns - 91.32ms
   32 columns - 117.01ms
   64 columns -  233.25ms
   128 columns - 478.25ms
   256 - 861.46ms
   512 - 1740.79ms
   1024 - 3210.88ms
   2048 - 8728.27ms
   4096 - 20375.19ms
   
   These numbers suggest a linear degradation in performance as we increase the 
number of columns till 1024 columns and drop is even worse for higher number of 
columns . 
   Does Druid parallelize over columns for a particular query ??
   Besides , why is there a performance drop for large number of columns ??
   Druid version = druid-0.15.0-incubating
   Number of historical servers=  4
   Segment granularity = Day
   Size of segment = 106-113 MB
   Rows in a segment = 120,000 – 130,000
   Columns in each segment = 5527 with 40 dimensions and 5487 metrics
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] Abhyuday99 opened a new issue #9872: Parallelization across columns

Reply via email to