On 27 Sep 2006, at 14:27, Paul Fisher wrote:
Hello,
I am currently working with BioMart to try to export SNP identifiers
in the
Mouse organism based on a chromosomal region (a gene start and end
position).
These are the only filters chosen on the filters pane. In the
Attributes pane I
have chosen to export the RefSnp Id, the strain name, the strain
allele, the
Ensembl stable gene Id, the chromosome, the chromosomal position of
the SNP,
the Ensembl gene location (coding etc), and validation.
However, when I obtain the results of SNPs for the gene GPX2, (start
position =
77,711,490; end position = 77,714,215), I only obtain 2 different
RefSnp Ids,
compared to the 12 RefSnp Ids found in dbSNP at the NCBI website.
After I
investigated the cause of this I found that the Strain attribute in
BioMart was
acting as a filter and not a display option, where any SNPs with no
strain
information would not be included in the output.
Could you please tell me if this is intentional, and if so the reason.
many thanks,
Paul Fisher
Hi Paul,
yes, you are right, this is a confusing issue. The data model is such
that a table you join
to contains only snps with strains and hence acts as unintentional
'filter'. All the other dimension tables
are the result of a precomputed left join with a main table and contain
all records including
the ones with NULL values. In those cases you get the correct
behaviour. However
the snp strain table is not 'left joined beforehand'.
We are going to correct this behaviour in the near future and make it
behave just the same
as other tables.
a.
------------------------------------------------------------------------
-------
Arek Kasprzyk
EMBL-European Bioinformatics Institute.
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
Tel: +44-(0)1223-494606
Fax: +44-(0)1223-494468
------------------------------------------------------------------------
-------