Hi Peter The genes in your output are the ones located on PAR Regions, suggesting they existing on both X and Y chromosome. But they have the same Ensembl Gene Id. Hope this explains the scenario.
Cheers Syed On Wed, 2007-05-23 at 11:32 -0400, Peter Andrews wrote: > I was exporting some upstream sequences for Homo sapiens. Of the > 31,545 genes exported (no filters) I received 21 duplicates. Both the > fasta header line with '>' and the upstream sequence were identical in > all cases. Here is some debugging output showing details: > > NOTE 1th duplicate, at fasta input record 30639: [ENSG00000185960, > ENSG00000185960.4]. gene identifier 'ENSG00000185960' previously found > at fasta input record 8429 which has these geneIds: [ENSG00000185960, > ENSG00000185960.4]. Do the sequences match? true Partial old > sequence: > TAAAAAGAAAAGTGTTTCCTCCCTGGCTGGAGGACCCAGGAGGAGGTCCCAGTTTTCCGGTGGGGATGGGCGTGGAGTAGGGGGCGGGGAAGGGATGAGG > Partial new sequence: > TAAAAAGAAAAGTGTTTCCTCCCTGGCTGGAGGACCCAGGAGGAGGTCCCAGTTTTCCGGTGGGGATGGGCGTGGAGTAGGGGGCGGGGAAGGGATGAGG > NOTE 2th duplicate, at fasta input record 30727: [ENSG00000197976, > ENSG00000197976.2]. gene identifier 'ENSG00000197976' previously found > at fasta input record 9268 which has these geneIds: [ENSG00000197976, > ENSG00000197976.2]. Do the sequences match? true Partial old > sequence: > CCTTCCCCTCCCCTCCCCTCCTTTCCCTTCCCCTCCCCTCCTCTCCCTTCCCCTCCCCTCCTCTCCCTTCCCCTCCCTTCCCCTCCATTCCCCTCCCTTC > Partial new sequence: > CCTTCCCCTCCCCTCCCCTCCTTTCCCTTCCCCTCCCCTCCTCTCCCTTCCCCTCCCCTCCTCTCCCTTCCCCTCCCTTCCCCTCCATTCCCCTCCCTTC > NOTE 3th duplicate, at fasta input record 30730: [ENSG00000182162, > ENSG00000182162.2]. gene identifier 'ENSG00000182162' previously found > at fasta input record 9310 which has these geneIds: [ENSG00000182162, > ENSG00000182162.2]. Do the sequences match? true Partial old > sequence: > TTTATTTGTTTATTTATTTATTTTTTGAGACAGAGTTTCGCTCTTGTTGCCCAGGCTGGGGTGCAGCGGCATGATCTCGGCTCACTGCAACCTCCGCCTC > Partial new sequence: > TTTATTTGTTTATTTATTTATTTTTTGAGACAGAGTTTCGCTCTTGTTGCCCAGGCTGGGGTGCAGCGGCATGATCTCGGCTCACTGCAACCTCCGCCTC > NOTE 4th duplicate, at fasta input record 30798: [ENSG00000205681, > ENSG00000205681.1]. gene identifier 'ENSG00000205681' previously found > at fasta input record 10007 which has these geneIds: [ENSG00000205681, > ENSG00000205681.1]. Do the sequences match? true Partial old > sequence: > GCTATGGCGCTTGGCTACCTGAGTCTTTATTCTGCCTTCCAGGTGCTTGTTGGTTGGATAACTTTGGGTAGGTTCTTGTACCTCTTTGAGCTTCAAGACT > Partial new sequence: > GCTATGGCGCTTGGCTACCTGAGTCTTTATTCTGCCTTCCAGGTGCTTGTTGGTTGGATAACTTTGGGTAGGTTCTTGTACCTCTTTGAGCTTCAAGACT > NOTE 5th duplicate, at fasta input record 30820: [ENSG00000124343, > ENSG00000124343.2]. gene identifier 'ENSG00000124343' previously found > at fasta input record 7623 which has these geneIds: [ENSG00000124343, > ENSG00000124343.2]. Do the sequences match? true Partial old > sequence: > CTAATCTCCAGTGATCCGCTCACCTCAGCCACCCAAAGTGCTGGGATTACAGACGTGAGCCACCGGGCCCAGCCAGCAGGGCTGATTTCTTCTGATGCTG > Partial new sequence: > CTAATCTCCAGTGATCCGCTCACCTCAGCCACCCAAAGTGCTGGGATTACAGACGTGAGCCACCGGGCCCAGCCAGCAGGGCTGATTTCTTCTGATGCTG > NOTE 6th duplicate, at fasta input record 30844: [ENSG00000124333, > ENSG00000124333.4]. gene identifier 'ENSG00000124333' previously found > at fasta input record 19603 which has these geneIds: [ENSG00000124333, > ENSG00000124333.4]. Do the sequences match? true Partial old > sequence: > AGGAAAAATAGCTAATGCATGCTGGGCTTTAATACCTAGGTGATGGGTTGATAGGTGCAGCAAATTACCATGGCACACATTTACCTGTATAACAAACCTG > Partial new sequence: > AGGAAAAATAGCTAATGCATGCTGGGCTTTAATACCTAGGTGATGGGTTGATAGGTGCAGCAAATTACCATGGCACACATTTACCTGTATAACAAACCTG > NOTE 7th duplicate, at fasta input record 30934: [ENSG00000198223, > ENSG00000198223.3]. gene identifier 'ENSG00000198223' previously found > at fasta input record 8798 which has these geneIds: [ENSG00000198223, > ENSG00000198223.3]. Do the sequences match? true Partial old > sequence: > TCCTGCAGGAATGGGGAGGCTAAGACGGTAGAGGTGCAGCCTGGTCAGCCATCTTTCACCTTTGCTGATGTTGCTATCCAGGTGTTTTCCATTGCATGTG > Partial new sequence: > TCCTGCAGGAATGGGGAGGCTAAGACGGTAGAGGTGCAGCCTGGTCAGCCATCTTTCACCTTTGCTGATGTTGCTATCCAGGTGTTTTCCATTGCATGTG > NOTE 8th duplicate, at fasta input record 30968: [ENSG00000205755, > ENSG00000205755.1]. gene identifier 'ENSG00000205755' previously found > at fasta input record 9187 which has these geneIds: [ENSG00000205755, > ENSG00000205755.1]. Do the sequences match? true Partial old > sequence: > GACGGAGTCTTGCTCTTGTCGCCCAGGCTGGAGTGCCGTGGCACGATCTCAGCTCACTGCCAACTCCGCCTCCCGGGTTCACGCCATTCTCCTGCCTCAG > Partial new sequence: > GACGGAGTCTTGCTCTTGTCGCCCAGGCTGGAGTGCCGTGGCACGATCTCAGCTCACTGCCAACTCCGCCTCCCGGGTTCACGCCATTCTCCTGCCTCAG > NOTE 9th duplicate, at fasta input record 31013: [ENSG00000196433, > ENSG00000196433.2]. gene identifier 'ENSG00000196433' previously found > at fasta input record 9741 which has these geneIds: [ENSG00000196433, > ENSG00000196433.2]. Do the sequences match? true Partial old > sequence: > GCCAATATAGTGAAACCCTGTCTCTACGAAAAATACAAAAATTAGCCAGGTATGGTGGCAGGTGCTTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGAA > Partial new sequence: > GCCAATATAGTGAAACCCTGTCTCTACGAAAAATACAAAAATTAGCCAGGTATGGTGGCAGGTGCTTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGAA > NOTE 10th duplicate, at fasta input record 31022: [ENSG00000168939, > ENSG00000168939.2]. gene identifier 'ENSG00000168939' previously found > at fasta input record 20849 which has these geneIds: [ENSG00000168939, > ENSG00000168939.2]. Do the sequences match? true Partial old > sequence: > GAGACAGCCTGAGTCAGCCTGAGTTAAAATCCTAGATCTGCAAACTGCCAACTGTGTAACCTTGGACAAGTTACTTAAGGTCTTTGGACCTTGGTTTCTC > Partial new sequence: > GAGACAGCCTGAGTCAGCCTGAGTTAAAATCCTAGATCTGCAAACTGCCAACTGTGTAACCTTGGACAAGTTACTTAAGGTCTTTGGACCTTGGTTTCTC > NOTE 11th duplicate, at fasta input record 31055: [ENSG00000169100, > ENSG00000169100.3]. gene identifier 'ENSG00000169100' previously found > at fasta input record 7624 which has these geneIds: [ENSG00000169100, > ENSG00000169100.3]. Do the sequences match? true Partial old > sequence: > AGCCAGCCTCATCTGGAAATAGCAGCTCTGGTCCCGGCCTCGCTGAGGCACTGAAAACCAGCACCAGGGCCCCGTCCAGCCCGGCCTCGCTGAGGCTGGG > Partial new sequence: > AGCCAGCCTCATCTGGAAATAGCAGCTCTGGTCCCGGCCTCGCTGAGGCACTGAAAACCAGCACCAGGGCCCCGTCCAGCCCGGCCTCGCTGAGGCTGGG > NOTE 12th duplicate, at fasta input record 31115: [ENSG00000185291, > ENSG00000185291.3]. gene identifier 'ENSG00000185291' previously found > at fasta input record 8154 which has these geneIds: [ENSG00000185291, > ENSG00000185291.3]. Do the sequences match? true Partial old > sequence: > AGGCTGGTCTTGAACCCCTGACCTCAGGTGATGCACCCACCTTGGCCTCCCACAGAGCTGGGATTACAGGCGTGAGCCACTGGGCCCCGCCCTGTATTTG > Partial new sequence: > AGGCTGGTCTTGAACCCCTGACCTCAGGTGATGCACCCACCTTGGCCTCCCACAGAGCTGGGATTACAGGCGTGAGCCACTGGGCCCCGCCCTGTATTTG > NOTE 13th duplicate, at fasta input record 31130: [ENSG00000124334, > ENSG00000124334.6]. gene identifier 'ENSG00000124334' previously found > at fasta input record 19934 which has these geneIds: [ENSG00000124334, > ENSG00000124334.6]. Do the sequences match? true Partial old > sequence: > CTTTTCTCTTAAGCATGGGTGACATAGTACTCTTTCTTCATGTGTTTGATAAATTTGTTTTTATCTTAGAAATTGTGAATGGTATACATTGTTGAGACTG > Partial new sequence: > CTTTTCTCTTAAGCATGGGTGACATAGTACTCTTTCTTCATGTGTTTGATAAATTTGTTTTTATCTTAGAAATTGTGAATGGTATACATTGTTGAGACTG > NOTE 14th duplicate, at fasta input record 31198: [ENSG00000169084, > ENSG00000169084.3]. gene identifier 'ENSG00000169084' previously found > at fasta input record 9163 which has these geneIds: [ENSG00000169084, > ENSG00000169084.3]. Do the sequences match? true Partial old > sequence: > ATTACCTGAGGTCAGGAGTTTGAGACCAGCCAGGCCAACATGGTGAAATCCCATCTCTATTAAAAATACGAAAATTATTTGGGTGTGCTGGTGCATGCCT > Partial new sequence: > ATTACCTGAGGTCAGGAGTTTGAGACCAGCCAGGCCAACATGGTGAAATCCCATCTCTATTAAAAATACGAAAATTATTTGGGTGTGCTGGTGCATGCCT > NOTE 15th duplicate, at fasta input record 31327: [ENSG00000182484, > ENSG00000182484.4]. gene identifier 'ENSG00000182484' previously found > at fasta input record 19614 which has these geneIds: [ENSG00000182484, > ENSG00000182484.4]. Do the sequences match? true Partial old > sequence: > ATGCATTCAGAAAACTTTAGATCACGGTTGAGAAGAATCAAAAATATTAAATCAAATGCAGATACTCCTTGTTTAGGAGCAGTACACTCATTATTGTTAG > Partial new sequence: > ATGCATTCAGAAAACTTTAGATCACGGTTGAGAAGAATCAAAAATATTAAATCAAATGCAGATACTCCTTGTTTAGGAGCAGTACACTCATTATTGTTAG > NOTE 16th duplicate, at fasta input record 31342: [ENSG00000002586, > ENSG00000002586.7]. gene identifier 'ENSG00000002586' previously found > at fasta input record 8086 which has these geneIds: [ENSG00000002586, > ENSG00000002586.7]. Do the sequences match? true Partial old > sequence: > AGCCTGTACCCCAGAACTTAAAGTATAATAATAACAATAATAAAAAGACAGGTGTTATCTCAGAGCCCCTGACTCAGTCGGCTGGGCAGCAAGTATGCCA > Partial new sequence: > AGCCTGTACCCCAGAACTTAAAGTATAATAATAACAATAATAAAAAGACAGGTGTTATCTCAGAGCCCCTGACTCAGTCGGCTGGGCAGCAAGTATGCCA > NOTE 17th duplicate, at fasta input record 31373: [ENSG00000182378, > ENSG00000182378.3]. gene identifier 'ENSG00000182378' previously found > at fasta input record 8467 which has these geneIds: [ENSG00000182378, > ENSG00000182378.3]. Do the sequences match? true Partial old > sequence: > GACCACAGTCCACATCACACCAGGACACGGAGGAAGGGCCAGGCCTCATGACCACAGTCCAGATCACACCAGGACACAGAGGAAGGGCCGGGCCCTGTGA > Partial new sequence: > GACCACAGTCCACATCACACCAGGACACGGAGGAAGGGCCAGGCCTCATGACCACAGTCCAGATCACACCAGGACACAGAGGAAGGGCCGGGCCCTGTGA > NOTE 18th duplicate, at fasta input record 31428: [ENSG00000169093, > ENSG00000169093.5]. gene identifier 'ENSG00000169093' previously found > at fasta input record 9036 which has these geneIds: [ENSG00000169093, > ENSG00000169093.5]. Do the sequences match? true Partial old > sequence: > TATTCCTTGATTTCAGATGTCTGGGCTCCAGAGCTGTAATACAATTAAGTTTTGCTGTTTTAAGCCCCAGGGTTTTGAGTGACAGTTACCAGCAACCCCC > Partial new sequence: > TATTCCTTGATTTCAGATGTCTGGGCTCCAGAGCTGTAATACAATTAAGTTTTGCTGTTTTAAGCCCCAGGGTTTTGAGTGACAGTTACCAGCAACCCCC > NOTE 19th duplicate, at fasta input record 31442: [ENSG00000167393, > ENSG00000167393.7]. gene identifier 'ENSG00000167393' previously found > at fasta input record 9170 which has these geneIds: [ENSG00000167393, > ENSG00000167393.7]. Do the sequences match? true Partial old > sequence: > CCCAGCAAACTCTGCAACACCTCAGGCCCTGCCAGCCTTGGGGGCCCGACAGCACCTCTTTGTTCTCCCAGAGCAAAGCCTGCACGGAGTGGGCCCCCGG > Partial new sequence: > CCCAGCAAACTCTGCAACACCTCAGGCCCTGCCAGCCTTGGGGGCCCGACAGCACCTCTTTGTTCTCCCAGAGCAAAGCCTGCACGGAGTGGGCCCCCGG > NOTE 20th duplicate, at fasta input record 31485: [ENSG00000178605, > ENSG00000178605.4]. gene identifier 'ENSG00000178605' previously found > at fasta input record 9699 which has these geneIds: [ENSG00000178605, > ENSG00000178605.4]. Do the sequences match? true Partial old > sequence: > NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN > Partial new sequence: > NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN > NOTE 21th duplicate, at fasta input record 31508: [ENSG00000169098, > ENSG00000169098.5]. gene identifier 'ENSG00000169098' previously found > at fasta input record 9900 which has these geneIds: [ENSG00000169098, > ENSG00000169098.5]. Do the sequences match? true Partial old > sequence: > GCCGGGCACGGTGGCTCACGCCTGCAATGCCAGCACTTTAGGAGGCCGAGGTGGGCAGATCACCTGAGGTCAGGAGTTCGAGACCAGCCTGGCCAACATG > Partial new sequence: > GCCGGGCACGGTGGCTCACGCCTGCAATGCCAGCACTTTAGGAGGCCGAGGTGGGCAGATCACCTGAGGTCAGGAGTTCGAGACCAGCCTGGCCAACATG > > > For those interested the results should still be available for a > little while: > http://www.biomart.org/biomart/martresults?file=martquery_0523154530_544.txt.gz > > Ideas? > > Thanks, > > Peter Andrews > > > -- > -------------- > Peter Andrews > Computational Genetics Lab > Dartmouth Hitchcock Medical Center > (603) 653-3598 -- ====================================== Syed Haider. EMBL-European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. ======================================
