-What I'am trying to do: Bit hard to explain. I've got a table consisting of ip addresses (ipv4_src), destination addresses (ipv4_dst), and port numbers (port_dst) and some other irrelevant columns. Ultimately my goal is to find a linear order in a subset of ports. For example, host A connects to B on port 1, 2, 3, 4,...20, I would like to find this linear relation. To achieve this, I have written some mathematical formula's. Unfortunately there is one downside to my formula: It can be fooled by adding some extreme values. It cannot find a linear relation in this list: 1, 2, 3, 4,...20, 45000. Although there are 20 numbers lined up, the 45000 number ruïns the result of the algorithm. So the query I've submitted, is ment to remove extreme values. The methods boxplot(pF.port_dst,"LOW") and boxplot(pF.port_dst,"HIGH") calculate in linear time the allowed range of numbers. Extreme values won't be included in this range. So in the example, the range would be [1,20] therby omitting the value 45000. Finally I would like to filter my table with port numbers for every <ipv4_src><ipv4_dst> tuple and remove all the numbers not fitting in the range.
-In human readable pseudo code this is the query: SELECT source,dest,port,octets FROM ( SELECT source,dest,boxplot(port,"LOW") AS low,boxplot(port,"HIGH") AS high FROM --Calculate the LOW and HIGH values for each source,dest pair. ( SELECT source,dest,port,octets FROM... GROUP BY source,dest,port --This removes the duplicate entries. ) pF GROUP BY source,dest ) boxplot ( SELECT source,dest,port,octets FROM... GROUP BY source,dest,port --This removes the duplicate entries (again!). ) filter WHERE filter.source=boxplot.source AND filter.dest=boxplot.dest AND filter.port>=boxplot.LOW AND filter.port<=boxplot.HIGH --Relate the tables 'boxplot' and 'filter' to eachother AND select only the <source,dest,port> tuples where port is in the range [LOW,HIGH] from the filter table. -Here is the original query I would like to optimize again: SELECT filter.ipv4_src, filter.ipv4_dst, filter.port_dst, filter.octets FROM ( SELECT pF.ipv4_src, pF.ipv4_dst, boxplot(pF.port_dst,"LOW") AS low,boxplot(pF.port_dst,"HIGH") AS high FROM ( SELECT ipv4_src, ipv4_dst, port_dst, octets FROM ... GROUP BY ipv4_src, ipv4_dst, port_dst ) pF GROUP BY pF.ipv4_src, pF.ipv4_dst ) boxplot, ( SELECT ipv4_src, ipv4_dst, port_dst, octets FROM ... GROUP BY ipv4_src, ipv4_dst, port_dst )filter WHERE filter.ipv4_src=boxplot.ipv4_src AND filter.ipv4_dst=boxplot.ipv4_dst AND filter.port_dst>=boxplot.low AND filter.port_dst<=boxplot.low I'll try the 'EXPLAIN' query tonight, but therefor I've got to create a small database first to speed up the results. Thnx in advance for your help. On Jan 22, 2008 8:15 AM, Sebastian Mendel <[EMAIL PROTECTED]> wrote: > Joris Kinable schrieb: > > > Optimize query > > > > I've got one query, which I would like to improve a lot since it takes > > very long (>24 hours) to execute. Here is the idea: > > 1. Take the table <ipv4_src><ipv4_dst><port_dst> (other rows in this > > table are not mentioned for clearity) and remove all duplicate > > tuple's. This is done by subquery 'filter'. > > 2. The same query is performed by the boxplot query, but this time an > > aditional group by command is executed, therby calculating a User > > Defined Function boxplot(row,type) which returns a double value. > > 3. Finally the results of the query in step 2 are used to select a > > subset of results from the 'filter' table. > > 4. As you can see, the subquery 'pF' used in step 2 is identical to > > the query 'filter'. It's an extreme waste to calculate the same table > > twice. I've tried to create a temporary table from filter, but > > unfortunately Mysql doesn't allow you to access a temporary table > > twice in the same query. I prefer a 1 query answer, instead of > > creating views, or temporary tables. > > > > Is there a way to improve this query, therby improving the execution time? > > > > Query: > > > > SELECT filter.ipv4_src, filter.ipv4_dst, filter.port_dst, filter.octets FROM > > ( > > SELECT pF.ipv4_src, pF.ipv4_dst, boxplot(pF.port_dst,"LOW") AS > > low,boxplot(pF.port_dst,"HIGH") AS high FROM > > ( > > SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE > > prot=6 GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY > > ipv4_src,ipv4_dst,port_dst ASC > > ) pF GROUP BY pF.ipv4_src, pF.ipv4_dst HAVING > > COUNT(filter.port_dst)>10 > > ) boxplot, > > ( > > SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE prot=6 > > GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY > > ipv4_src,ipv4_dst,port_dst ASC > > ) filter > > WHERE filter.ipv4_src=boxplot.ipv4_src AND > > filter.ipv4_dst=boxplot.ipv4_dst AND filter.port_dst>=boxplot.low AND > > filter.port_dst<=boxplot.low > > what you are trying to do? > > and how about formating your query in a human readable way? > > did you tried EXPLAIN? > > what type of syntax is this: "boxplot(pF.port_dst,"LOW")" ? > > > -- > Sebastian > -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]