[mart-dev] Re: HEEEELLLLPPPPP!!!!

Darren Oakley Thu, 23 Jul 2009 08:21:38 -0700

Hi All,

Sorry to dredge this old email up again - did a fix ever make it intothe 0.7 branch on cvs? (Batching in postgres being off, so requiringan 'order by' clause).


Thanks,

Daz

On 21 Apr 2009, at 01:42, Junjun Zhang wrote:

So many emails today, didn't read this one.

Sure, I will add this to my list.

Junjun
From: [email protected] [mailto:[email protected]] On Behalf Of Arek Kasprzyk
Sent: Monday, April 20, 2009 9:26 AM
To: Damian Smedley
Cc: Darren Oakley; [email protected]
Subject: Re: [DCC] HEEEELLLLPPPPP!!!!
I guess we have no choice. We"ll add this but I guess it it meansslower batching for progress. Junjun - do you want to take care ofthis?
A.



On 20-Apr-09, at 8:52, "Damian Smedley" <[email protected]> wrote:
On Mon, Apr 20, 2009 at 1:43 PM, Syed Haider <[email protected]> wrote:
Thanks a lot Damian and Darren for the fix. I see a controlstatement in TableSet.pm, something like:
if (postgres) { do acrobatics }

surely should be

if (postgres) {
   warn("Please consider using another RDBMS!");
}


Lets coordinate with dcc for the fix.


Cheers
Syed


Damian Smedley wrote:

     from the docs:


     LIMIT Clause

   LIMIT { /count/ | ALL } [ { OFFSET | , } /start/ ]

    OFFSET /start/
where /count/ specifies the maximum number of rows to return, and /start/ specifies the number of rows to skip before starting toreturn rows.
LIMIT allows you to retrieve just a portion of the rows that aregenerated by the rest of the query. If a limit count is given, nomore than that many rows will be returned. If an offset is given,that many rows will be skipped before starting to return rows.
When using LIMIT, it is a good idea to use an ORDER BY clause thatconstrains the result rows into a unique order. Otherwise you willget an unpredictable subset of the query's rows --- you maybe asking for the tenth through twentieth rows, but tenththrough twentieth in what ordering? You don't know what ordering,unless you specified ORDER BY.
As of Postgres 7.0, the query optimizer takes LIMIT into accountwhen generating a query plan, so you are very likely to getdifferent plans (yielding different row orders) depending on whatyou give for LIMIT and OFFSET. Thus, using different LIMIT/OFFSETvalues to select different subsets of a query result /will giveinconsistent results/ unless you enforce a predictable resultordering with ORDER BY. This is not a bug; it is an inherentconsequence of the fact that SQL does not promise to deliver theresults of a query in any particular order unless ORDER BY is usedto constrain the order.
I hacked my code to add an ORDER BY table_pkey before the limitclauses and it definitely fixes the random file size problem withthis dataset. I think we definitely need to add this in for postgres.
Cheers
Damian
On Mon, Apr 20, 2009 at 12:06 PM, Damian Smedley <[email protected] <mailto:[email protected]>> wrote:
I got the dump ok! I have loaded it onto my server and localbiomart
   and see the same behaviour.
   I think I can get some consistent weirdness now:
1 Go to your KOMP mart and preview the first 200 results for theallgene symbols query. Hitting the Results button again generallykeep
   returning the same list which is good.
   2 From this same preview page now export all the results using the
   GO button
   3 Now hit the Results button again to regenerate the first 200
   results and you will see a different order.

   You can keep repeating this trick in steps 2 and 3 to see how the
   table seems to get shuffled at some stage during the whole list
output. Interestingly after enough shuffles it seems to comeback tothe original order - the shuffling does not seem to be totallyrandom!
If you try the same trick with one of your MySQL datasets thisdoes
   not happen.
It seems to me PostGres has some sort of behaviour where itshuffles
   the table every so often (recaches it I guess). We need to
   investigate if there is a setting to turn this off - if not then
   postgres needs to add an order by PK to all batches I think

   Cheers
   Damian




   On Mon, Apr 20, 2009 at 11:56 AM, Darren Oakley <[email protected]
   <mailto:[email protected]>> wrote:

       Oops, looks like the database dump was too large to email...

       I've uploaded it to one of our web servers instead:

       http://www.i-dcc.org/dev/komp_dcc_mart.sql.gz
-- The Wellcome Trust Sanger Institute is operated byGenome
       Research Limited, a charity registered in England with number
1021457 and a company registered in England with number2742969,
       whose registered office is 215 Euston Road, London, NW1 2BE.

--

The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE.

[mart-dev] Re: HEEEELLLLPPPPP!!!!

Reply via email to