Hi All,
Sorry to dredge this old email up again - did a fix ever make it into
the 0.7 branch on cvs? (Batching in postgres being off, so requiring
an 'order by' clause).
Thanks,
Daz
On 21 Apr 2009, at 01:42, Junjun Zhang wrote:
So many emails today, didn't read this one.
Sure, I will add this to my list.
Junjun
From: [email protected] [mailto:[email protected]
] On Behalf Of Arek Kasprzyk
Sent: Monday, April 20, 2009 9:26 AM
To: Damian Smedley
Cc: Darren Oakley; [email protected]
Subject: Re: [DCC] HEEEELLLLPPPPP!!!!
I guess we have no choice. We"ll add this but I guess it it means
slower batching for progress. Junjun - do you want to take care of
this?
A.
On 20-Apr-09, at 8:52, "Damian Smedley" <[email protected]> wrote:
On Mon, Apr 20, 2009 at 1:43 PM, Syed Haider <[email protected]> wrote:
Thanks a lot Damian and Darren for the fix. I see a control
statement in TableSet.pm, something like:
if (postgres) { do acrobatics }
surely should be
if (postgres) {
warn("Please consider using another RDBMS!");
}
Lets coordinate with dcc for the fix.
Cheers
Syed
Damian Smedley wrote:
from the docs:
LIMIT Clause
LIMIT { /count/ | ALL } [ { OFFSET | , } /start/ ]
OFFSET /start/
where /count/ specifies the maximum number of rows to return, and /
start/ specifies the number of rows to skip before starting to
return rows.
LIMIT allows you to retrieve just a portion of the rows that are
generated by the rest of the query. If a limit count is given, no
more than that many rows will be returned. If an offset is given,
that many rows will be skipped before starting to return rows.
When using LIMIT, it is a good idea to use an ORDER BY clause that
constrains the result rows into a unique order. Otherwise you will
get an unpredictable subset of the query's rows --- you may
be asking for the tenth through twentieth rows, but tenth
through twentieth in what ordering? You don't know what ordering,
unless you specified ORDER BY.
As of Postgres 7.0, the query optimizer takes LIMIT into account
when generating a query plan, so you are very likely to get
different plans (yielding different row orders) depending on what
you give for LIMIT and OFFSET. Thus, using different LIMIT/OFFSET
values to select different subsets of a query result /will give
inconsistent results/ unless you enforce a predictable result
ordering with ORDER BY. This is not a bug; it is an inherent
consequence of the fact that SQL does not promise to deliver the
results of a query in any particular order unless ORDER BY is used
to constrain the order.
I hacked my code to add an ORDER BY table_pkey before the limit
clauses and it definitely fixes the random file size problem with
this dataset. I think we definitely need to add this in for postgres.
Cheers
Damian
On Mon, Apr 20, 2009 at 12:06 PM, Damian Smedley <[email protected] <mailto:[email protected]
>> wrote:
I got the dump ok! I have loaded it onto my server and local
biomart
and see the same behaviour.
I think I can get some consistent weirdness now:
1 Go to your KOMP mart and preview the first 200 results for the
all
gene symbols query. Hitting the Results button again generally
keep
returning the same list which is good.
2 From this same preview page now export all the results using the
GO button
3 Now hit the Results button again to regenerate the first 200
results and you will see a different order.
You can keep repeating this trick in steps 2 and 3 to see how the
table seems to get shuffled at some stage during the whole list
output. Interestingly after enough shuffles it seems to come
back to
the original order - the shuffling does not seem to be totally
random!
If you try the same trick with one of your MySQL datasets this
does
not happen.
It seems to me PostGres has some sort of behaviour where it
shuffles
the table every so often (recaches it I guess). We need to
investigate if there is a setting to turn this off - if not then
postgres needs to add an order by PK to all batches I think
Cheers
Damian
On Mon, Apr 20, 2009 at 11:56 AM, Darren Oakley <[email protected]
<mailto:[email protected]>> wrote:
Oops, looks like the database dump was too large to email...
I've uploaded it to one of our web servers instead:
http://www.i-dcc.org/dev/komp_dcc_mart.sql.gz
-- The Wellcome Trust Sanger Institute is operated by
Genome
Research Limited, a charity registered in England with number
1021457 and a company registered in England with number
2742969,
whose registered office is 215 Euston Road, London, NW1 2BE.
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.