-What I'am trying to do:
Bit hard to explain. I've got a table consisting of ip addresses
(ipv4_src), destination addresses (ipv4_dst), and port numbers
(port_dst) and some other irrelevant columns. Ultimately my goal is to
find a linear order in a subset of ports. For example, host A connects
to B on port 1, 2, 3, 4,...20, I would like to find this linear
relation. To achieve this, I have written some mathematical formula's.
Unfortunately there is one downside to my formula: It can be fooled by
adding some extreme values. It cannot find a linear relation in this
list: 1, 2, 3, 4,...20, 45000. Although there are 20 numbers lined up,
the 45000 number ruïns the result of the algorithm. So the query I've
submitted, is ment to remove extreme values. The methods
boxplot(pF.port_dst,"LOW") and boxplot(pF.port_dst,"HIGH") calculate
in linear time the allowed range of numbers. Extreme values won't be
included in this range. So in the example, the range would be [1,20]
therby omitting the value 45000. Finally I would like to filter my
table with port numbers for every <ipv4_src><ipv4_dst> tuple and
remove all the numbers not fitting in the range.

-In human readable pseudo code this is the query:

SELECT source,dest,port,octets FROM
(
        SELECT source,dest,boxplot(port,"LOW") AS low,boxplot(port,"HIGH") AS
high FROM --Calculate the LOW and HIGH values for each source,dest
pair.
        (
                SELECT source,dest,port,octets FROM... GROUP BY source,dest,port
--This removes the duplicate entries.
        ) pF GROUP BY source,dest
) boxplot
(
        SELECT source,dest,port,octets FROM... GROUP BY source,dest,port
--This removes the duplicate entries (again!).
) filter
WHERE filter.source=boxplot.source AND filter.dest=boxplot.dest AND
filter.port>=boxplot.LOW AND filter.port<=boxplot.HIGH --Relate the
tables 'boxplot' and 'filter' to eachother AND select only the
<source,dest,port> tuples where port is in the range [LOW,HIGH] from
the filter table.


-Here is the original query I would like to optimize again:

SELECT filter.ipv4_src, filter.ipv4_dst, filter.port_dst, filter.octets FROM
(
        SELECT pF.ipv4_src, pF.ipv4_dst, boxplot(pF.port_dst,"LOW") AS
low,boxplot(pF.port_dst,"HIGH") AS high FROM
        (
                SELECT ipv4_src, ipv4_dst, port_dst, octets FROM ... GROUP BY
ipv4_src, ipv4_dst, port_dst
        ) pF GROUP BY pF.ipv4_src, pF.ipv4_dst
) boxplot,
(
        SELECT ipv4_src, ipv4_dst, port_dst, octets FROM ... GROUP BY
ipv4_src, ipv4_dst, port_dst
)filter
WHERE filter.ipv4_src=boxplot.ipv4_src AND
filter.ipv4_dst=boxplot.ipv4_dst AND filter.port_dst>=boxplot.low AND
filter.port_dst<=boxplot.low

I'll try the 'EXPLAIN' query tonight, but therefor I've got to create
a small database first to speed up the results.

Thnx in advance for your help.



On Jan 22, 2008 8:15 AM, Sebastian Mendel <[EMAIL PROTECTED]> wrote:
> Joris Kinable schrieb:
>
> > Optimize query
> >
> > I've got one query, which I would like to improve a lot since it takes
> > very long (>24 hours) to execute. Here is the idea:
> > 1. Take the table <ipv4_src><ipv4_dst><port_dst> (other rows in this
> > table are not mentioned for clearity) and remove all duplicate
> > tuple's. This is done by subquery 'filter'.
> > 2. The same query is performed by the boxplot query, but this time an
> > aditional group by command is executed, therby calculating a User
> > Defined Function boxplot(row,type) which returns a double value.
> > 3. Finally the results of the query in step 2 are used to select a
> > subset of results from the 'filter' table.
> > 4. As you can see, the subquery 'pF' used in step 2 is identical to
> > the query 'filter'. It's an extreme waste to calculate the same table
> > twice. I've tried to create a temporary table from filter, but
> > unfortunately Mysql doesn't allow you to access a temporary table
> > twice in the same query. I prefer a 1 query answer, instead of
> > creating views, or temporary tables.
> >
> > Is there a way to improve this query, therby improving the execution time?
> >
> > Query:
> >
> > SELECT filter.ipv4_src, filter.ipv4_dst, filter.port_dst, filter.octets FROM
> > (
> >       SELECT pF.ipv4_src, pF.ipv4_dst, boxplot(pF.port_dst,"LOW") AS
> > low,boxplot(pF.port_dst,"HIGH") AS high FROM
> >       (
> >               SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE
> > prot=6 GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY
> > ipv4_src,ipv4_dst,port_dst ASC
> >       ) pF GROUP BY pF.ipv4_src, pF.ipv4_dst HAVING 
> > COUNT(filter.port_dst)>10
> > ) boxplot,
> > (
> >       SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE prot=6
> > GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY
> > ipv4_src,ipv4_dst,port_dst ASC
> > ) filter
> > WHERE filter.ipv4_src=boxplot.ipv4_src AND
> > filter.ipv4_dst=boxplot.ipv4_dst AND filter.port_dst>=boxplot.low AND
> > filter.port_dst<=boxplot.low
>
> what you are trying to do?
>
> and how about formating your query in a human readable way?
>
> did you tried EXPLAIN?
>
> what type of syntax is this: "boxplot(pF.port_dst,"LOW")" ?
>
>
> --
> Sebastian
>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to