On Tue, 27 Feb 2007, Arek Kasprzyk wrote:


On 27 Feb 2007, at 14:25, Will Spooner wrote:

<snip>

WormMart has the same gene/transcript (CDS in WormBase's case) issues. I solve this by 'merging' the multiple transcript values into a single attribute in the gene_main table. See this example;
 http://tinyurl.com/2x3bj5

The 'merged' attributes are pretty vital for wormbase where the 'cross dimension multiplicity' (unconstrained dimension joins) are much more of an issue than for Ensembl. I would like to see this approach supported natively by BioMart/MartView, and hopefuly MartBuilder as well.


this is what I was referring to as on of the ways of 'unifying' the data at the higher level. This can be done in may ways eg de-normalization on the main table level which would
solve main->main multiplicity problem as it is in your case,
or perhaps having the dimensions with unified annotation on the higher level
This is certainly one of the options which we are now looking for MBuilder to support

I'm not sure about the distinction between 'de-normalization on the main table' and 'dimensions with unified annotation'. Here is the WormMart approach;

 For the dimension named 'foo', the main table has the following
 attributes;
   foo_count  - the number of corresponding records in foo__dm
   foo_dmlist - the list of names of the corresponding foos, e.g.
                'foo1 | foo2 | foo3
   foo_dminfo - the list of summaries of the corresponding foos, e.g.
                '[foo1] first foo | [foo2] another foo | [foo3] more foo'

 In addition, the dimension table has the the following attributes;
   foo  - The name of the foo record, e.g. 'foo1'
   info - The summary of the foo record, e.g. '[foo1] first foo'


or even screening the output dynamically as it comes out. The first two seem to be error-free and would perform correctly. I have some doubts about the last one but I suppose we should
investigate all the options

I am also worried about this one. A possible formatter-specific approach would be e.g. for the HTML formatter to screen for duplicates on a single results page, with an indication that duplicates have been screened.

Will




a.



Will



-------------------------------------------------------------------------------
Arek Kasprzyk
EMBL-European Bioinformatics Institute.
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
Tel: +44-(0)1223-494606
Fax: +44-(0)1223-494468
-------------------------------------------------------------------------------



--
---
William Spooner
WormMart Developer

Reply via email to