On 15-Jul-08, at 10:15 AM, Richard Holland wrote:

Olivier is right.

BioMart has been built around the rather convenient but undocumented
feature (or bug?) of MySQL that identical queries will always return
results in the same order, thus allowing LIMIT to be used without
ORDER BY.
Postgres, being a 'clever' database, in common with Oracle, does not
guarantee row order because it does not process rows in the simplistic
sequential file access method that MySQL does. Therefore any LIMIT
condition _must_ be accompanied by an ORDER BY.

Richard,
not sure if not returning rows in expected order makes a database a particularly clever one ;)
and would not call returning rows in expected order a bug  either ;)

but seriously, MySQL is no different than Postgress or oracle in this respect. MySQL manual states explicitly (as any other rdbms for that matter) that the row order during retrieval is not guaranteed. This in particular true when after creating table a number of rows have been deleted or re-inserted.
In practice however as per our testing row order  for read only data
tends to be maintained to a degree allowing batching system to work across platforms. (Oliver - it would be helpful for us to look into your data closer to see how/when this order breaks
so if you could get in touch with us off the list we could have a look)
Adding 'order by' for batching is obviously a no brainer :) - however this has a rather dramatic negative impact on the performance as you may expect - rendering batching useless
so not really a solution either.

BTW, we are currently thinking about replacing batching with an alternative approach but this is
not yet tested so can't really comment on that


a.




The only reason BioMart uses LIMIT is to try and speed up processing
by using an internal process called 'batching' to break up results
into smaller units to process. Even in MySQL this is not an optimum
setup because MySQL still executes and processes all the rows up to
the start of the OFFSET, meaning that the number of rows MySQL has to
process is in the order of O(n squared).

In other words, the only way to make BioMart play nicely with Oracle
and Postgres is to modify it to use ORDER BY in all queries to force
the RDBMS to return rows in the same order every time, therefore
allowing LIMIT to work in the way it does in MySQL (and thus the way
BioMart expects it to work).

How this might be achieved though I don't know.

cheers,
Richard



2008/7/15 Olivier Arnaiz <[EMAIL PROTECTED]>:
Hmm ... no... I am talking about 'view all' and the export into file, not
about the preview section.
Apparently for postgres the ORDER BY should be necessary... Probably not for
MySql.
Thanks
Olivier
Le 15 juil. 08 à 11:08, Syed Haider a écrit :

Hi Olivier,
I guess you are concerned about the preview section of results in
MartView. The preview section certainly does not guarantee the order and
it does not offer you the controls to navigate through a slice of
results starting from an OFFSET to a LIMIT. This function will be
offered in the next release. And by doing this, we will make sure that
the ordering of results is consistent too.

cheers
syed
On Tue, 2008-07-15 at 10:57 +0200, Olivier Arnaiz wrote:

Hi,

Actually my table is already built around a sorted column.

This is a naive question  but is it a difference between postgreSQL
and MySQL ? (see this page)
http://www.postgresql.org/docs/current/interactive/queries-limit.html

"When using LIMIT, it is important to use an ORDER BY clause that
constrains the result rows into a unique order. Otherwise you will get
an unpredictable subset of the query's rows. You might be asking for
the tenth through twentieth rows, but tenth through twentieth in what
ordering? The ordering is unknown, unless you specified ORDER BY. "

Thanks for your help
Olivier



Le 11 juil. 08 à 20:29, Richard Holland a écrit :

I'm guessing that you're seeing duplicate results in your output, or
some that are completely missing, or a mixture of both?

Both .... because the results are not sorted

Le 12 juil. 08 à 01:15, Syed Haider a écrit :

Hi Olivier,

If I understand you correctly, and you want to have ORDER BY clause
in
biomart-perl query builder, I would suggest to build your table
around
the required ORDER BY columns. If you execute ORDER BY on the fly
(via
biomart-perl), this will be done with each batch of the query and
similarly every time a user/new user asks for results with different
attributes/filters. This would make your query response very slow.
If
you know the bottle neck which i guess is the case, better fix it on
the
table level.

regards
syed



On Fri, 2008-07-11 at 17:22 +0200, Olivier Arnaiz wrote:

Hello,



I am using biomart with a postgres database (7.4).
And all seem work perfectly with the count of the results but I
can
not obtain a good table with an 'export' or a 'view all results'.



When I see the log4 log, I can remark that the query is split in
many sql
using LIMIT and OFFSET... So why not ...
but I do not see an ORDER BY command and this parameter seems to
be
necessary to obtain my good results...



SELECT main.name,[dm_table].field FROM main, dm WHERE
main.key=dm.key AND dm.field =t LIMIT 200
SELECT main.name,[dm_table].field FROM main, dm WHERE
main.key=dm.key AND dm.field =t LIMIT 200 OFFSET 200
SELECT main.name,[dm_table].field FROM main, dm WHERE
main.key=dm.key AND dm.field =t LIMIT 400 OFFSET 400



instead of :
SELECT main.name,[dm_table].field FROM main, dm WHERE
main.key=dm.key AND dm.field =t ORDER BY main.key LIMIT 200
SELECT main.name,[dm_table].field FROM main, dm WHERE
main.key=dm.key AND dm.field =t ORDER BY main.key LIMIT 200 OFFSET
200
SELECT main.name,[dm_table].field FROM main, dm WHERE
main.key=dm.key AND dm.field =t ORDER BY main.key LIMIT 400 OFFSET
400







That is a problem with postgres versus MySQL (I do not have
any experience with MySQL) or a problem with my config ... or
something else ?



Any suggestions
Thanks



Olivier





--
Olivier Arnaiz
CGM-CNRS
91198 Gif-sur-Yvette
01 69 82 43 75



ParameciumDB:
 http://paramecium.cgm.cnrs-gif.fr/













--
======================================
Syed Haider.
EMBL-European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
======================================


--
Olivier Arnaiz
CGM-CNRS
91198 Gif-sur-Yvette
01 69 82 43 75

ParameciumDB:
 http://paramecium.cgm.cnrs-gif.fr/






--
======================================
Syed Haider.
EMBL-European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
======================================

--
Olivier Arnaiz
CGM-CNRS
91198 Gif-sur-Yvette
01 69 82 43 75
ParameciumDB:
 http://paramecium.cgm.cnrs-gif.fr/





Reply via email to