Re: [mart-dev] Canned queries 2 - How to modify results set before display

Roger Hull Wed, 19 Mar 2008 10:05:11 -0700

Hi Arek,

Thanks for the positive reply. It is particularly important to us thatyou you have the policy of maintaining backward compatibility.

You say of counts: "For large datasets this frequently ends up takinglonger than the preview of the results for the actual query" - I don'tsee this as a big problem, as the queries seem to run fast anyway. It ismore important to be able to get the answer, even if if it takes a bitlonger.

You answered preliminary ETA is April for BioMart 0.7, but didn't giveany comment about a published list of enhancements and bug fixes forthat release. Such a list would be really helpful - a list of known orsuspected bugs would save users time if they hit an already known bug,as the first assumption of a user is that it is something he or she hasdone wrong, and sometimes only after much time spent investigating willit get reported to the mart-dev list.

For enhancements, I am particularly interested in extension of yourqueries to cover the standard types of query supported by SQL, forexample "GROUP BY" (in conjunction with counts) and WHERE clauses thathave an expression involving more than one column (attribute). Theseconstructions often provide quick answers to common questions about thedatasets, like "how many proteins are there of each of the followingtypes...".


I await with interest a further response from Syed.

Regards,
Roger

Arek Kasprzyk wrote:

On 12-Mar-08, at 3:44 PM, Roger Hull wrote:
Hi Syed,
Hi Roger,
I'll let Syed reply to you in detail to your questions but let me putthis into the context of our future developments.
Thanks for the advice. I think we see quite big risks in doingchanges which depend on our understanding too much of how your codeworks. This could present problems in maintaining the code we havewritten, or changed, in the future, when BioMart is upgraded or weneed to add new features.
I would like to ask some questions about BioMart upgrades:
(A) Will you maintain backward compatibility between a BioMart 0.6installation which gets its data from a remote BioMart MartService,when the remote BioMart is upgraded to 0.7,..., 1.0, etc ?
We do currently maintain the compatibility between our 0.6 service andall other marts which run earlier versions. The central sever is a'translation point' which maintains backward compatibility.Not all of the BioMart servers accessible from our central server are0.6 but you can query them in uniform fashion. It is our intention tomaintain this compatibility in a similar way in the future.
(B) Do you have a date for BioMart 0.7, and have you published athere a list of enhancements and bug fixes for that release?
the preliminary ETA is april
(C) There seem to be a number of BioMart installations where peoplehave modified your code. As I said, I'm reluctant to follow thispath, but I wonder if you could consider adding functions, hooks, orsimilar mechanisms in your code so that the BioMart behaviour can bechanged in various ways without modifying your code? (The filesheader.tt and footer.tt are useful, but it is limited what can bedone by adding code to these files.)
yes, the web code will undergo a major re-organization for 0.8 whichwill coincide with the release of the new configuration system. One ofthe main goals of this 're-organization' is to provide a flexibleframework bywhich people can extend the code both in terms of web GUI but alsothings like visualization etc. This is still at a very early stage soany suggestions are very welcome.
 A couple of suggestions, based on what I have been wanting to do:
(C1) I would like to call your AJAX functions for my ownpurposes. But your function doAjaxMagic(toDo) only supports the twovalues of toDo = 'countByAjax' and 'resultsByAjax'. Could you providea general purpose function which anyone could use? - with an argumentto specify the URL which will handle the request and another tosupply a function to handle the results (preferably supporting POSTas well as GET). Then use this function internally to implementdoAjaxMagic.If I modify your current function, or make a modified copy, then Ihave to maintain this code in the future if you change the functioninternally (e.g. to support new browser versions).(C2) Could you implement, and document, a callback function inperl, which has as input arguments the results of a query in aparsable format (maybe XML), so that by default this function returnsthe input results unchanged. Then if the user wants to filter orotherwise modify the result data, this can be done by adding code tothis function, and modifying the data before returning it to thecaller. [OK, it's not quite as simple as this, because you batch theresults data and return a certain number of result rows at a time,and if some are filtered out, some more have to be processed to makeup the number, but I'm sure a solution can be found.]When I looked at your code to see how and where I might do this typeof result modification, so that various formats (HTML, CSV, etc) andvarious output methods (display in MartView, download to a file), areall supported, it needed quite a lot of study of your code, and mightwell present difficulties to maintain. You have a perl API based uponBioMart::QueryRunner, which you use internally in your code, but theonly way I can see to get at the results using this API is$query_runner->printResults(), which gives already formatted results.I want to get at the results before formatting, so I don't have toparse data formatted in various ways, and reformat the data aftermodification.Apart from filtering the results, another common requirement would beto add links to fields when the results are to be formatted as HTML,or to add another column of fields fetched from a local non-BioMartdatabase, using one of the BioMart attributes in the results as anindex to retrieve data from this database.
(D) As I mentioned earlier, the Count returned by BioMart is "how manyrows in the main table of the dataset match your filters so far". Inthe case of PRIDE, this is the number of experiments. But often I needto know the number of rows returned from the query, not the number ofrows from the main table - for example I want to know the number ofdistinct proteins or peptides which match my filters. Will you supporta more usual "number of rows returned" count in the future?
There are two problems here: one the counts for multiple mains, thesecond the count of the actual number of rows returned. As far as themains are concerned (which I think would solve your protein problemfor PRIDE) we do intend to provide a proper count for each main tableei, your query has selected this many experiments, proteins etc ...how many mains you happen to have there as oppose to what we supportnow which is a top main. As far as the number of rows is concerned theimplementation is really trivial for single datasets and the onlyconcern there was the unpredictable performance. For large datasetsthis frequently ends up taking longer than the preview of the resutlsfor the actual query so it is not always works nice in an interactiveenvironment. A bigger challenge is to provide a row count forfederated queries. We are currently thinking of all possible scenariosfor all of them
a.
Regards,
Roger.

Re: [mart-dev] Canned queries 2 - How to modify results set before display

Reply via email to