Yes. This is for the Max Activity and Max Bandwidth reports. Karl
On Sun, Sep 19, 2010 at 2:13 PM, Alexey Serba <ase...@gmail.com> wrote: > And all of this is only with single table repohistory, right? Is this > some kind of complex analytics/stats? > > On Sun, Sep 19, 2010 at 8:48 PM, Karl Wright <daddy...@gmail.com> wrote: >> Here you go: >> >> // The query we will generate here looks like this: >> // SELECT * >> // FROM >> // (SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, >> t3.bytecount AS bytecount, >> // t3.windowstart AS starttime, >> t3.windowend AS endtime >> // FROM (SELECT * FROM (SELECT t0.bucket AS bucket, >> t0.starttime AS windowstart, t0.starttime + <interval> AS windowend, >> // SUM(t1.datasize * ((case when t0.starttime + >> <interval> < t1.endtime then t0.starttime + <interval> else t1.endtime >> end) - >> // (case when t0.starttime>t1.starttime then >> t0.starttime else t1.starttime end)) >> // / (t1.endtime - t1.starttime)) AS bytecount >> // FROM (SELECT DISTINCT substring(entityid from >> '<bucketregexp>') AS bucket, starttime FROM repohistory WHERE >> <criteria>) t0, repohistory t1 >> // WHERE t0.bucket=substring(t1.entityid from >> '<bucket_regexp>') >> // AND t1.starttime < t0.starttime + >> <interval> AND t1.endtime > t0.starttime >> // AND <criteria on t1> >> // GROUP BY bucket,windowstart,windowend >> // UNION SELECT t0a.bucket AS bucket, t0a.endtime - >> <interval> AS windowstart, t0a.endtime AS windowend, >> // SUM(t1a.datasize * ((case when t0a.endtime < >> t1a.endtime then t0a.endtime else t1a.endtime end) - >> // (case when t0a.endtime - <interval> > >> t1a.starttime then t0a.endtime - <interval> else t1a.starttime end)) >> // / (t1a.endtime - t1a.starttime)) AS bytecount >> // FROM (SELECT DISTINCT substring(entityid from >> '<bucketregexp>') AS bucket, endtime FROM repohistory WHERE >> <criteria>) t0a, repohistory t1a >> // WHERE t0a.bucket=substring(t1a.entityid from >> '<bucket_regexp>') >> // AND (t1a.starttime < t0a.endtime AND >> t1a.endtime > t0a.endtime - <interval> >> // AND <criteria on t1a> >> // GROUP BY bucket,windowstart,windowend) t2 >> // ORDER BY bucket ASC,bytecount >> DESC) t3) t4 ORDER BY xxx LIMIT yyy OFFSET zzz; >> >> I have low confidence that ANY planner would be able to locate the >> common part of a 2x larger query and not do it twice. >> >> Karl >> >> >> >> On Sun, Sep 19, 2010 at 12:05 PM, Alexey Serba <ase...@gmail.com> wrote: >>>> The other thing is that we cannot afford to use the same "table" >>>> twice, as it is actually an extremely expensive query in its own >>>> right, with multiple joins, select distinct's, etc. under the covers. >>> Even if you create indexes on bucket and activitycount columns? It >>> might be that the query plans for these two queries (with "distinct >>> on" hack and subquery max/subquery order limit/join) would be the >>> same. >>> >>>> I'd be happy to post it but it may shock you. ;-) >>> The way I indent SQL queries should say that I'm not afraid of >>> multipage queries :) >>> >>>> >>>> Karl >>>> >>>> >>>> >>>> >>>> >>>> On Sun, Sep 19, 2010 at 11:32 AM, Alexey Serba <ase...@gmail.com> wrote: >>>>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS >>>>>> activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM >>>>>> (...) t3 >>>>> Do you have primary key in your t3 table? >>>>> >>>>>> In Postgresql, what this does is to return the FIRST entire row matching >>>>>> each distinct idbucket result. >>>>> FIRST based on which sort? >>>>> >>>>> Lets say you want to return FIRST row based on t3.windowstart column >>>>> and you have primary key in t3 table. Then I believe your query can be >>>>> rewritten in the following ways: >>>>> >>>>> 1. Using subqueries >>>>> SELECT >>>>> bucket, primary_key, windowstart, etc >>>>> FROM >>>>> table AS t1 >>>>> WHERE >>>>> windowstart=( SELECT max(windowstart) FROM table AS t2 WHERE >>>>> bucket = t1.bucket ) >>>>> >>>>> 2. Using joins instead of subqueries ( in case Derby doesn't support >>>>> subqueries - not sure about that ) >>>>> SELECT >>>>> t1.bucket, t1.primary_key, windowstart, etc >>>>> FROM >>>>> table AS t1 >>>>> LEFT OUTER JOIN table AS t2 ON ( t1.bucket = t2.bucket AND >>>>> t2.windowstart > t1.windowstart ) >>>>> WHERE >>>>> t2.primary_key IS NULL >>>>> >>>>> HTH, >>>>> Alex >>>>> >>>>> On Sat, Sep 18, 2010 at 2:28 PM, Karl Wright <daddy...@gmail.com> wrote: >>>>>> Hi Folks, >>>>>> >>>>>> For two of the report queries, ACF uses the following Postgresql >>>>>> construct, which sadly seems to have no Derby equivalent: >>>>>> >>>>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount >>>>>> AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime >>>>>> FROM (...) t3 >>>>>> >>>>>> In Postgresql, what this does is to return the FIRST entire row >>>>>> matching each distinct idbucket result. If Derby had a "FIRST()" >>>>>> aggregate function, it would be the equivalent of: >>>>>> >>>>>> SELECT t3.bucket AS idbucket, FIRST(t3.activitycount) AS >>>>>> activitycount, FIRST(t3.windowstart) AS starttime, FIRST(t3.windowend) >>>>>> AS endtime FROM (...) t3 GROUP BY t3.bucket >>>>>> >>>>>> Unfortunately, Derby has no such aggregate function. Furthermore, it >>>>>> would not be ideal if I were to do the work myself in ACF, because >>>>>> this is a resultset that needs to be paged through with offset and >>>>>> length, for presentation to the user and sorting, so it gets wrapped >>>>>> in another SELECT ... FROM (...) ORDER BY ... OFFSET ... LIMIT ... >>>>>> that does that part. >>>>>> >>>>>> Does anyone have any ideas and/or Derby contacts? I'd really like the >>>>>> quick-start example to have a functional set of reports. >>>>>> >>>>>> Karl >>>>>> >>>>> >>>> >>> >> >