Thank you :-) I do realise the SQL is not being used as it should be, but as it is not my database and nor is it my SQL, there's not much I can do to really change either of these.... I merely have to work with a database that has been produced by a third party group, and I must say at this point that I have nothing but praise for the ensembl group for the difficult job that they have done very well (see http://www.ensembl.org)
So, I realise this is not the problem of mySQL, but what I want to try and find out is really the internal workings of mySQL when it performs this operation so that I can understand why my copy of ensembl, which is produced from direct dumps of the main ensembl, behaves differently to the main ensembl. I guess from what you are saying, that the data is stored randomly, that there is very little I can do to actually make my database behave the same as the main database? Could it be affected in any way by operating system and/or file system? Superficially the data is organised in exactly the same way in both databases, but I have no doubt that things like memory locations are completely different, but possibly if I could understand what the variables are that affect this behaviour I could minimise the inconsistancy...? Thanks for your time Mick Richard Emery wrote: > mysql is acting correctly. > > GROUP BY is used to consolidate data for SUMming, COUNTing, etc. Your > SELECT statement makes not such request. You have simply requested the > value of a specific field. Data are stored in mysql databases randomly. > Therefore, when you request a field's data, you are getting whatever is > first in the list of records matching your WHERE clause. > > Bottom line: you are NOT using GROUP BY as it is supposed to be used. Your > SQL is in error, not mysql. > > hope this helps... > ----- Original Message ----- > From: Mick Watson <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Friday, April 19, 2002 6:01 AM > Subject: group by help? > > Hi > > I am having some trouble figuring out how mySQL interprets the group by > clause in a particular set of data that I have (the ensembl database if > anyone is familar with it!) > > Now, I have some data like this: > > +---------+------------+----------+ > | gene_id | display_id | db_name | > +---------+------------+----------+ > | 24173 | Q9H701 | SPTREMBL | > | 24173 | Q96GS5 | SPTREMBL | > +---------+------------+----------+ > > and this is produced by the following SQL: > > select t.gene_id, x.display_id,e.db_name > from ensembl_core_test.objectXref as ox, > ensembl_core_test.Xref as x, > ensembl_core_test.transcript as t, > ensembl_core_test.externalDB as e > where e.db_name='SPTREMBL' and > x.xrefID = ox.xrefID and > t.translation_id = ox.ensembl_id and > e.externalDBId=x.externalDBId and gene_id =24173; > > Now, the SQL is not important, what is is that we have two display_ids > for one gene_id. Now, if we add a "group by gene_id" clause into the > above SQL, then presumably mySQL must make an arbitrary decision on > which display_id to choose. And the odd thing is that in the main > ensembl database it chooses one, and in my local copy it chooses the > other! > > So what I want to figure out is how mySQL makes that arbitrary decision > - is it based on which it comes across first in memory, which it comes > across last, alphabetical order, random choice (though mySQL is always > consistent in which it chooses) ... or is there some other way it will > make the decision? > > Furthermore, and more confusingly, mySQL chooses differently if I > parameterise the SQL. For example: > > mysql> create table test > -> select t.gene_id, x.display_id,e.db_name > -> from ensembl_core_test.objectXref as ox, > -> ensembl_core_test.Xref as x, > -> ensembl_core_test.transcript as t, > -> ensembl_core_test.externalDB as e > -> where e.db_name='SPTREMBL' and > -> x.xrefID = ox.xrefID and > -> t.translation_id = ox.ensembl_id and > -> e.externalDBId=x.externalDBId and gene_id = 24173 > -> group by gene_id > -> order by gene_id; > Query OK, 1 row affected (0.00 sec) > Records: 1 Duplicates: 0 Warnings: 0 > > mysql> select * from test; > +---------+------------+----------+ > | gene_id | display_id | db_name | > +---------+------------+----------+ > | 24173 | Q9H701 | SPTREMBL | > +---------+------------+----------+ > 1 row in set (0.00 sec) > > compare this to: > > mysql> create table test > -> select t.gene_id, x.display_id,e.db_name > -> from ensembl_core_test.objectXref as ox, > -> ensembl_core_test.Xref as x, > -> ensembl_core_test.transcript as t, > -> ensembl_core_test.externalDB as e > -> where e.db_name='SPTREMBL' and > -> x.xrefID = ox.xrefID and > -> t.translation_id = ox.ensembl_id and > -> e.externalDBId=x.externalDBId > -> group by gene_id > -> order by gene_id; > Query OK, 11674 rows affected (6.84 sec) > Records: 11674 Duplicates: 0 Warnings: 0 > > mysql> select * from test where gene_id = 24173; > +---------+------------+----------+ > | gene_id | display_id | db_name | > +---------+------------+----------+ > | 24173 | Q96GS5 | SPTREMBL | > +---------+------------+----------+ > 1 row in set (0.01 sec) > > So here we see that mySQL has chosen differently simply because of the > presence or absence of the gene_id = 24173 in the create table command > > I want to try and figure out why mySQL is making these rather > inconsistant decisions and see if it's possible to remove this "feature" > > Thanks for your time, if there is another mailing list which is more > appropriate, please tell me! > > Thanks > Mick > > --------------------------------------------------------------------- > Before posting, please check: > http://www.mysql.com/manual.php (the manual) > http://lists.mysql.com/ (the list archive) > > To request this thread, e-mail <[EMAIL PROTECTED]> > To unsubscribe, e-mail > <[EMAIL PROTECTED]> > Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
