Re: group by help?

Richard Emery Fri, 19 Apr 2002 06:34:23 -0700

As a relational database, it doe not matter the order in which data are
stored.


Rather, what matters are the methods by which data are extracted via the
SELECT statement.  In order to derive benefits of GROUP BY, you must use it
with SELECTs that use GROUP BY functionality, such as SUM(), COUNT().

If you need to know the internal machinations of mysql, I can only suggest
that you study the source code.
----- Original Message -----
From: Mick Watson <[EMAIL PROTECTED]>
To: Richard Emery <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Friday, April 19, 2002 7:34 AM
Subject: Re: group by help?


Thank you :-)

I do realise the SQL is not being used as it should be, but as it is not my
database and nor is it my SQL, there's not much I can do to really change
either of these.... I merely have to work with a database that has been
produced by a third party group, and I must say at this point that I have
nothing but praise for the ensembl group for the difficult job that they
have
done very well (see http://www.ensembl.org)

So, I realise this is not the problem of mySQL, but what I want to try and
find out is really the internal workings of mySQL when it performs this
operation so that I can understand why my copy of ensembl, which is produced
from direct dumps of the main ensembl, behaves differently to the main
ensembl.

I guess from what you are saying, that the data is stored randomly, that
there
is very little I can do to actually make my database behave the same as the
main database?  Could it be affected in any way by operating system and/or
file system?  Superficially the data is organised in exactly the same way in
both databases, but I have no doubt that things like memory locations are
completely different, but possibly if I could understand what the variables
are that affect this behaviour I could minimise the inconsistancy...?

Thanks for your time

Mick

Richard Emery wrote:

> mysql is acting correctly.
>
> GROUP BY is used to consolidate data for SUMming, COUNTing, etc.  Your
> SELECT statement makes not such request.  You have simply requested the
> value of a specific field.  Data are stored in mysql databases randomly.
> Therefore, when you request a field's data, you are getting whatever is
> first in the list of records matching your WHERE clause.
>
> Bottom line: you are NOT using GROUP BY as it is supposed to be used.
Your
> SQL is in error, not mysql.
>
> hope this helps...
> ----- Original Message -----
> From: Mick Watson <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Friday, April 19, 2002 6:01 AM
> Subject: group by help?
>
> Hi
>
> I am having some trouble figuring out how mySQL interprets the group by
> clause in a particular set of data that I have (the ensembl database if
> anyone is familar with it!)
>
> Now, I have some data like this:
>
> +---------+------------+----------+
> | gene_id | display_id | db_name  |
> +---------+------------+----------+
> |   24173 | Q9H701     | SPTREMBL |
> |   24173 | Q96GS5     | SPTREMBL |
> +---------+------------+----------+
>
> and this is produced by the following SQL:
>
>  select t.gene_id, x.display_id,e.db_name
>       from ensembl_core_test.objectXref     as ox,
>             ensembl_core_test.Xref           as x,
>             ensembl_core_test.transcript     as t,
>             ensembl_core_test.externalDB     as e
>      where e.db_name='SPTREMBL' and
>             x.xrefID         = ox.xrefID and
>            t.translation_id = ox.ensembl_id and
>            e.externalDBId=x.externalDBId and gene_id =24173;
>
> Now, the SQL is not important, what is is that we have two display_ids
> for one gene_id.  Now, if we add a "group by gene_id" clause into the
> above SQL, then presumably mySQL must make an arbitrary decision on
> which display_id to choose.  And the odd thing is that in the main
> ensembl database it chooses one, and in my local copy it chooses the
> other!
>
> So what I want to figure out is how mySQL makes that arbitrary decision
> - is it based on which it comes across first in memory, which it comes
> across last, alphabetical order, random choice (though mySQL is always
> consistent in which it chooses) ... or is there some other way it will
> make the decision?
>
> Furthermore, and more confusingly, mySQL chooses differently if I
> parameterise the SQL.  For example:
>
> mysql> create table test
>     -> select t.gene_id, x.display_id,e.db_name
>     ->       from ensembl_core_test.objectXref     as ox,
>     ->            ensembl_core_test.Xref           as x,
>     ->            ensembl_core_test.transcript     as t,
>     ->       ensembl_core_test.externalDB     as e
>     ->      where e.db_name='SPTREMBL' and
>     ->            x.xrefID         = ox.xrefID and
>     ->            t.translation_id = ox.ensembl_id and
>     ->            e.externalDBId=x.externalDBId and gene_id = 24173
>     ->   group by gene_id
>     ->   order by gene_id;
> Query OK, 1 row affected (0.00 sec)
> Records: 1  Duplicates: 0  Warnings: 0
>
> mysql> select * from test;
> +---------+------------+----------+
> | gene_id | display_id | db_name  |
> +---------+------------+----------+
> |   24173 | Q9H701     | SPTREMBL |
> +---------+------------+----------+
> 1 row in set (0.00 sec)
>
> compare this to:
>
> mysql> create table test
>     -> select t.gene_id, x.display_id,e.db_name
>     ->       from ensembl_core_test.objectXref     as ox,
>     ->            ensembl_core_test.Xref           as x,
>     ->            ensembl_core_test.transcript     as t,
>     ->       ensembl_core_test.externalDB     as e
>     ->      where e.db_name='SPTREMBL' and
>     ->            x.xrefID         = ox.xrefID and
>     ->            t.translation_id = ox.ensembl_id and
>     ->            e.externalDBId=x.externalDBId
>     ->   group by gene_id
>     ->   order by gene_id;
> Query OK, 11674 rows affected (6.84 sec)
> Records: 11674  Duplicates: 0  Warnings: 0
>
> mysql> select * from test where gene_id = 24173;
> +---------+------------+----------+
> | gene_id | display_id | db_name  |
> +---------+------------+----------+
> |   24173 | Q96GS5     | SPTREMBL |
> +---------+------------+----------+
> 1 row in set (0.01 sec)
>
> So here we see that mySQL has chosen differently simply because of the
> presence or absence of the gene_id = 24173 in the create table command
>
> I want to try and figure out why mySQL is making these rather
> inconsistant decisions and see if it's possible to remove this "feature"
>
> Thanks for your time, if there is another mailing list which is more
> appropriate, please tell me!
>
> Thanks
> Mick
>
> ---------------------------------------------------------------------
> Before posting, please check:
>    http://www.mysql.com/manual.php   (the manual)
>    http://lists.mysql.com/           (the list archive)
>
> To request this thread, e-mail <[EMAIL PROTECTED]>
> To unsubscribe, e-mail
> <[EMAIL PROTECTED]>
> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail
<[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php



---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: group by help?

Reply via email to