Olivier is right. BioMart has been built around the rather convenient but undocumented feature (or bug?) of MySQL that identical queries will always return results in the same order, thus allowing LIMIT to be used without ORDER BY.
Postgres, being a 'clever' database, in common with Oracle, does not guarantee row order because it does not process rows in the simplistic sequential file access method that MySQL does. Therefore any LIMIT condition _must_ be accompanied by an ORDER BY. The only reason BioMart uses LIMIT is to try and speed up processing by using an internal process called 'batching' to break up results into smaller units to process. Even in MySQL this is not an optimum setup because MySQL still executes and processes all the rows up to the start of the OFFSET, meaning that the number of rows MySQL has to process is in the order of O(n squared). In other words, the only way to make BioMart play nicely with Oracle and Postgres is to modify it to use ORDER BY in all queries to force the RDBMS to return rows in the same order every time, therefore allowing LIMIT to work in the way it does in MySQL (and thus the way BioMart expects it to work). How this might be achieved though I don't know. cheers, Richard 2008/7/15 Olivier Arnaiz <[EMAIL PROTECTED]>: > Hmm ... no... I am talking about 'view all' and the export into file, not > about the preview section. > Apparently for postgres the ORDER BY should be necessary... Probably not for > MySql. > Thanks > Olivier > Le 15 juil. 08 à 11:08, Syed Haider a écrit : > > Hi Olivier, > I guess you are concerned about the preview section of results in > MartView. The preview section certainly does not guarantee the order and > it does not offer you the controls to navigate through a slice of > results starting from an OFFSET to a LIMIT. This function will be > offered in the next release. And by doing this, we will make sure that > the ordering of results is consistent too. > > cheers > syed > On Tue, 2008-07-15 at 10:57 +0200, Olivier Arnaiz wrote: > > Hi, > > Actually my table is already built around a sorted column. > > This is a naive question but is it a difference between postgreSQL > and MySQL ? (see this page) > http://www.postgresql.org/docs/current/interactive/queries-limit.html > > "When using LIMIT, it is important to use an ORDER BY clause that > constrains the result rows into a unique order. Otherwise you will get > an unpredictable subset of the query's rows. You might be asking for > the tenth through twentieth rows, but tenth through twentieth in what > ordering? The ordering is unknown, unless you specified ORDER BY. " > > Thanks for your help > Olivier > > > > Le 11 juil. 08 à 20:29, Richard Holland a écrit : > > I'm guessing that you're seeing duplicate results in your output, or > some that are completely missing, or a mixture of both? > > Both .... because the results are not sorted > > Le 12 juil. 08 à 01:15, Syed Haider a écrit : > > Hi Olivier, > > If I understand you correctly, and you want to have ORDER BY clause > in > biomart-perl query builder, I would suggest to build your table > around > the required ORDER BY columns. If you execute ORDER BY on the fly > (via > biomart-perl), this will be done with each batch of the query and > similarly every time a user/new user asks for results with different > attributes/filters. This would make your query response very slow. > If > you know the bottle neck which i guess is the case, better fix it on > the > table level. > > regards > syed > > > > On Fri, 2008-07-11 at 17:22 +0200, Olivier Arnaiz wrote: > > Hello, > > > > I am using biomart with a postgres database (7.4). > And all seem work perfectly with the count of the results but I > can > not obtain a good table with an 'export' or a 'view all results'. > > > > When I see the log4 log, I can remark that the query is split in > many sql > using LIMIT and OFFSET... So why not ... > but I do not see an ORDER BY command and this parameter seems to > be > necessary to obtain my good results... > > > > SELECT main.name,[dm_table].field FROM main, dm WHERE > main.key=dm.key AND dm.field =t LIMIT 200 > SELECT main.name,[dm_table].field FROM main, dm WHERE > main.key=dm.key AND dm.field =t LIMIT 200 OFFSET 200 > SELECT main.name,[dm_table].field FROM main, dm WHERE > main.key=dm.key AND dm.field =t LIMIT 400 OFFSET 400 > > > > instead of : > SELECT main.name,[dm_table].field FROM main, dm WHERE > main.key=dm.key AND dm.field =t ORDER BY main.key LIMIT 200 > SELECT main.name,[dm_table].field FROM main, dm WHERE > main.key=dm.key AND dm.field =t ORDER BY main.key LIMIT 200 OFFSET > 200 > SELECT main.name,[dm_table].field FROM main, dm WHERE > main.key=dm.key AND dm.field =t ORDER BY main.key LIMIT 400 OFFSET > 400 > > > > > > > > That is a problem with postgres versus MySQL (I do not have > any experience with MySQL) or a problem with my config ... or > something else ? > > > > Any suggestions > Thanks > > > > Olivier > > > > > > -- > Olivier Arnaiz > CGM-CNRS > 91198 Gif-sur-Yvette > 01 69 82 43 75 > > > > ParameciumDB: > http://paramecium.cgm.cnrs-gif.fr/ > > > > > > > > > > > > > > -- > ====================================== > Syed Haider. > EMBL-European Bioinformatics Institute > Wellcome Trust Genome Campus, Hinxton, > Cambridge CB10 1SD, UK. > ====================================== > > > -- > Olivier Arnaiz > CGM-CNRS > 91198 Gif-sur-Yvette > 01 69 82 43 75 > > ParameciumDB: > http://paramecium.cgm.cnrs-gif.fr/ > > > > > > > -- > ====================================== > Syed Haider. > EMBL-European Bioinformatics Institute > Wellcome Trust Genome Campus, Hinxton, > Cambridge CB10 1SD, UK. > ====================================== > > -- > Olivier Arnaiz > CGM-CNRS > 91198 Gif-sur-Yvette > 01 69 82 43 75 > ParameciumDB: > http://paramecium.cgm.cnrs-gif.fr/ > > > >
