Abhyuday99 opened a new issue #9872: URL: https://github.com/apache/druid/issues/9872
I am performing a sum query for different number of columns . For the same time-frame of a single day , I increase the number of columns in my query as powers to two and get the below results . **The timings attained are after setting useCache and populateCache as false in the query parameters** All timings 1 column - 71.37ms 2 column - 72.05ms 4 columns - 73.98ms 8 columns - 79.85ms 16 columns - 91.32ms 32 columns - 117.01ms 64 columns - 233.25ms 128 columns - 478.25ms 256 - 861.46ms 512 - 1740.79ms 1024 - 3210.88ms 2048 - 8728.27ms 4096 - 20375.19ms These numbers suggest a linear degradation in performance as we increase the number of columns till 1024 columns and drop is even worse for higher number of columns . Does Druid parallelize over columns for a particular query ?? Besides , why is there a performance drop for large number of columns ?? Druid version = druid-0.15.0-incubating Number of historical servers= 4 Segment granularity = Day Size of segment = 106-113 MB Rows in a segment = 120,000 – 130,000 Columns in each segment = 5527 with 40 dimensions and 5487 metrics ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
