Hi,

We're running into a problem in populating our biomart databases for our Ensembl datasets. The problem lies in the population of the *_gene_ensembl__exon_transcript__dm mart tables.

I take scerevisiae as an example because it's being done fine by the Ensembl 
team, so I don't understand what's wrong with our procedure.
We're using a saccharomyces_cerevisiae_core_50_1i database installed on our local mysql server. The data were dumped from ensembl ftp site, and we're using ensembl_50.xml that Ensembl gave us. We just adapted it to our database connectivity.

The problem is that this table has only data for cerevisiae genes attached to the mitochondrial chromosome.

The problem lies with this mart building query:

create table fungal_mart_redo_50.TEMP28 as select a.*,b.value as value_1060,b.attrib_type_id as attrib_type_id_1060 from fungal_mart_redo_50.TEMP26 as a inner join saccharomyces_cerevisiae_core_50_1i.seq_region_attrib as b on a.seq_region_id_1015=b.seq_region_id and (b.attrib_type_id=11 or b.attrib_type_id is null);

TEMP26 is fine and has data for every chromosomes, TEMP28 has only data for the mitochondrial chromosome.

What happens is that seq_region_attr table has only attrib_type_id=11 for the mitochondrial seq_region. I haven't seen any rows where seq_region_attr.attrib_type_id is null

attrib_type_id=11 corresponds to 'Codon Table' attribute.

So I guess it tries to add codon table information into *_gene_ensembl__exon_transcript__dm. It seems that ensembl by default doesn't attach a codon table to a seq_region, I guess because they are all 1, except for mitochndrial or chloroplast genes.

So, we must be missing something in our mart building procedure ? Does it ring a bell to you ?
I can fix this issue by replacing the inner join by a left join. Should this be 
enough ?



We're facing another problem. attrib_type_id=11 might not be true for all datasets. We also have Ensembl core databases that we built ourselves, and it turns out that the attrib_type corresponding to 'Codon Table' has attrib_type_id=3. I mean, we want to set up a fungal mart for two datasets, scerevisiae and spombe. So we have a build xml file with attrib_type_id=11 so we get scerevisiae ok (assuming we replace the above inner join by a left join), but we don't get spombe right as it requires attrib_type_id=3 !

So, is the only way to fix it, to have consistently the same attrib_type_id value for all datasets or could you suggest another solution that does not rely on a predefined attrib_type_id value ?

Thanks
Arnaud



Reply via email to