On 21/08/08 9:32 AM, "Arnaud Kerhornou" <[EMAIL PROTECTED]> wrote:

> Hi,
>
> We're running into a problem in populating our biomart databases for our
> Ensembl datasets. The
> problem lies in the population of the *_gene_ensembl__exon_transcript__dm mart
> tables.
>
> I take scerevisiae as an example because it's being done fine by the Ensembl
> team, so I don't understand what's wrong with our procedure.
> We're using a saccharomyces_cerevisiae_core_50_1i database installed on
> our local mysql server. The data were dumped from ensembl ftp site, and
> we're using ensembl_50.xml that Ensembl gave us. We just adapted it to our
> database connectivity.
>
> The problem is that this table has only data for cerevisiae genes
> attached to the mitochondrial chromosome.
>
> The problem lies with this mart building query:
>
> create table fungal_mart_redo_50.TEMP28 as select a.*,b.value as
> value_1060,b.attrib_type_id as attrib_type_id_1060 from
> fungal_mart_redo_50.TEMP26 as a inner join
> saccharomyces_cerevisiae_core_50_1i.seq_region_attrib as b on
> a.seq_region_id_1015=b.seq_region_id and (b.attrib_type_id=11 or
> b.attrib_type_id is null);
>
> TEMP26 is fine and has data for every chromosomes, TEMP28 has only data
> for the mitochondrial chromosome.
>
> What happens is that seq_region_attr table has only attrib_type_id=11
> for the mitochondrial seq_region. I haven't seen any rows where
> seq_region_attr.attrib_type_id is null
>
> attrib_type_id=11 corresponds to 'Codon Table' attribute.
>
> So I guess it tries to add codon table information into
> *_gene_ensembl__exon_transcript__dm. It seems that ensembl by default
> doesn't attach a codon table to a seq_region, I guess because they are
> all 1, except for mitochndrial or chloroplast genes.
>
> So, we must be missing something in our mart building procedure ? Does
> it ring a bell to you ?
> I can fix this issue by replacing the inner join by a left join. Should this
> be enough ?
>
>
>
> We're facing another problem. attrib_type_id=11 might not be true for all
> datasets. We also have Ensembl core databases that we built ourselves, and it
> turns out that
> the attrib_type corresponding to 'Codon Table' has attrib_type_id=3.
> I mean, we want to set up a fungal mart for two datasets, scerevisiae and
> spombe. So we have a build xml file with attrib_type_id=11 so we get
> scerevisiae ok (assuming we replace the above inner join by a left join),
> but we don't get spombe right as it requires attrib_type_id=3 !
>
> So, is the only way to fix it, to have consistently the same attrib_type_id
> value for all datasets or could you suggest another solution that does not
> rely on a predefined
> attrib_type_id value ?
>
> Thanks
> Arnaud
>
>


Hi Arnaud,
I am cc'ing ensembl team because we need some input from them on the
attribute_type_id values etc ...

A.




>

Reply via email to