Hi all, referencing http://listserver.ebi.ac.uk/mailing-lists-archives/mart-dev/msg01094.html , I observed 'gene duplication' when attempting to export genes filtered by a single chromosome. I wonder if the explanation is the same as for the aforementioned case - despite the single-chromosome filter.
here's the query used (via http://www.biomart.org/biomart/martview/): <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Query> <Query virtualSchemaName = "default" header = "0" count = "" softwareVersion = "0.5" > <Dataset name = "rnorvegicus_gene_ensembl" interface = "default" > <Attribute name = "ensembl_gene_id" /> <Attribute name = "chromosome_name" /> <Attribute name = "description" /> <Attribute name = "start_position" /> <Attribute name = "end_position" /> <Attribute name = "strand" /> <Filter name = "chromosome_name" value = "X"/> <Filter name = "with_entrezgene" excluded = "0"/> <Filter name = "transcript_status" value = "KNOWN"/> <Filter name = "biotype" value = "protein_coding"/> </Dataset> </Query> here's a link to results via webservice: http://www.biomart.org/biomart/martservice?query=%3C?xml%20version=%221.0%22%20encoding=%22UTF-8%22?%3E%3C!DOCTYPE%20Query%3E%3CQuery%20%20virtualSchemaName%20=%20%22default%22%20header%20=%20%220%22%20count%20=%20%22%22%20softwareVersion%20=%20%220.5%22%20%3E%3CDataset%20name%20=%20%22rnorvegicus_gene_ensembl%22%20interface%20=%20%22default%22%20%3E%3CAttribute%20name%20=%20%22ensembl_gene_id%22%20/%3E%3CAttribute%20name%20=%20%22chromosome_name%22%20/%3E%3CAttribute%20name%20=%20%22description%22%20/%3E%3CAttribute%20name%20=%20%22start_position%22%20/%3E%3CAttribute%20name%20=%20%22end_position%22%20/%3E%3CAttribute%20name%20=%20%22strand%22%20/%3E%3CFilter%20name%20=%20%22chromosome_name%22%20value%20=%20%22X%22/%3E%3CFilter%20name%20=%20%22with_entrezgene%22%20excluded%20=%20%220%22/%3E%3CFilter%20name%20=%20%22transcript_status%22%20value%20=%20%22KNOWN%22/%3E%3CFilter%20name%20=%20%22biotype%22%20value%20=%20%22protein_coding%22/%3E%3C/Dataset%3E%3C/Query%3E And here's a portion of the output (sorted ASC by ensembl_gene_id): ENSRNOG00000002437 X 124527886 124528491 -1 ENSRNOG00000002449 X melanoma antigen family D, 2 [Source:RefSeq_peptide;Acc:NP_536727] 40056332 40064505 -1 ENSRNOG00000002449 X melanoma antigen family D, 2 [Source:RefSeq_peptide;Acc:NP_536727] 40056332 40064505 -1 ENSRNOG00000002451 X 94617360 94687407 -1 and one more duplicate sample: ENSRNOG00000003622 X cytochrome b-245, beta polypeptide [Source:RefSeq_peptide;Acc:NP_076455] 25514572 25547181 -1 ENSRNOG00000003667 X Dystrophin (Fragment). [Source:Uniprot/SWISSPROT;Acc:P11530] 69607890 71671414 1 ENSRNOG00000003667 X Dystrophin (Fragment). [Source:Uniprot/SWISSPROT;Acc:P11530] 69607890 71671414 1 ENSRNOG00000003674 X pirin [Source:RefSeq_peptide;Acc:NP_001009474] 50864981 50974861 -1 there might be other duplicates, I just didn't look for more. so is the reason the same as described in http://listserver.ebi.ac.uk/mailing-lists-archives/mart-dev/msg01094.html , or this is something different? Thanks in advance, -- Sincerely yours, Bogdan Tokovenko, PhD student at the Laboratory of Protein Biosynthesis, Department of Genetic Information Translation Mechanisms, Institute of Molecular Biology and Genetics, Kyiv, Ukraine http://bogdan.org.ua/
