Thanks!

Syed Haider wrote:
Hi Peter
The genes in your output are the ones located on PAR Regions, suggesting
they existing on both X and Y chromosome. But they have the same Ensembl
Gene Id.
Hope this explains the scenario.

Cheers
Syed

On Wed, 2007-05-23 at 11:32 -0400, Peter Andrews wrote:
  
I was exporting some upstream sequences for Homo sapiens. Of the
31,545 genes exported (no filters) I received 21 duplicates. Both the
fasta header line with '>' and the upstream sequence were identical in
all cases. Here is some debugging output showing details:

NOTE 1th duplicate, at fasta input record 30639: [ENSG00000185960,
ENSG00000185960.4]. gene identifier 'ENSG00000185960' previously found
at fasta input record 8429 which has these geneIds: [ENSG00000185960,
ENSG00000185960.4].  Do the sequences match? true Partial old
sequence:
TAAAAAGAAAAGTGTTTCCTCCCTGGCTGGAGGACCCAGGAGGAGGTCCCAGTTTTCCGGTGGGGATGGGCGTGGAGTAGGGGGCGGGGAAGGGATGAGG Partial new sequence: TAAAAAGAAAAGTGTTTCCTCCCTGGCTGGAGGACCCAGGAGGAGGTCCCAGTTTTCCGGTGGGGATGGGCGTGGAGTAGGGGGCGGGGAAGGGATGAGG
NOTE 2th duplicate, at fasta input record 30727: [ENSG00000197976,
ENSG00000197976.2]. gene identifier 'ENSG00000197976' previously found
at fasta input record 9268 which has these geneIds: [ENSG00000197976,
ENSG00000197976.2].  Do the sequences match? true Partial old
sequence:
CCTTCCCCTCCCCTCCCCTCCTTTCCCTTCCCCTCCCCTCCTCTCCCTTCCCCTCCCCTCCTCTCCCTTCCCCTCCCTTCCCCTCCATTCCCCTCCCTTC Partial new sequence: CCTTCCCCTCCCCTCCCCTCCTTTCCCTTCCCCTCCCCTCCTCTCCCTTCCCCTCCCCTCCTCTCCCTTCCCCTCCCTTCCCCTCCATTCCCCTCCCTTC
NOTE 3th duplicate, at fasta input record 30730: [ENSG00000182162,
ENSG00000182162.2]. gene identifier 'ENSG00000182162' previously found
at fasta input record 9310 which has these geneIds: [ENSG00000182162,
ENSG00000182162.2].  Do the sequences match? true Partial old
sequence:
TTTATTTGTTTATTTATTTATTTTTTGAGACAGAGTTTCGCTCTTGTTGCCCAGGCTGGGGTGCAGCGGCATGATCTCGGCTCACTGCAACCTCCGCCTC Partial new sequence: TTTATTTGTTTATTTATTTATTTTTTGAGACAGAGTTTCGCTCTTGTTGCCCAGGCTGGGGTGCAGCGGCATGATCTCGGCTCACTGCAACCTCCGCCTC
NOTE 4th duplicate, at fasta input record 30798: [ENSG00000205681,
ENSG00000205681.1]. gene identifier 'ENSG00000205681' previously found
at fasta input record 10007 which has these geneIds: [ENSG00000205681,
ENSG00000205681.1].  Do the sequences match? true Partial old
sequence:
GCTATGGCGCTTGGCTACCTGAGTCTTTATTCTGCCTTCCAGGTGCTTGTTGGTTGGATAACTTTGGGTAGGTTCTTGTACCTCTTTGAGCTTCAAGACT Partial new sequence: GCTATGGCGCTTGGCTACCTGAGTCTTTATTCTGCCTTCCAGGTGCTTGTTGGTTGGATAACTTTGGGTAGGTTCTTGTACCTCTTTGAGCTTCAAGACT
NOTE 5th duplicate, at fasta input record 30820: [ENSG00000124343,
ENSG00000124343.2]. gene identifier 'ENSG00000124343' previously found
at fasta input record 7623 which has these geneIds: [ENSG00000124343,
ENSG00000124343.2].  Do the sequences match? true Partial old
sequence:
CTAATCTCCAGTGATCCGCTCACCTCAGCCACCCAAAGTGCTGGGATTACAGACGTGAGCCACCGGGCCCAGCCAGCAGGGCTGATTTCTTCTGATGCTG Partial new sequence: CTAATCTCCAGTGATCCGCTCACCTCAGCCACCCAAAGTGCTGGGATTACAGACGTGAGCCACCGGGCCCAGCCAGCAGGGCTGATTTCTTCTGATGCTG
NOTE 6th duplicate, at fasta input record 30844: [ENSG00000124333,
ENSG00000124333.4]. gene identifier 'ENSG00000124333' previously found
at fasta input record 19603 which has these geneIds: [ENSG00000124333,
ENSG00000124333.4].  Do the sequences match? true Partial old
sequence:
AGGAAAAATAGCTAATGCATGCTGGGCTTTAATACCTAGGTGATGGGTTGATAGGTGCAGCAAATTACCATGGCACACATTTACCTGTATAACAAACCTG Partial new sequence: AGGAAAAATAGCTAATGCATGCTGGGCTTTAATACCTAGGTGATGGGTTGATAGGTGCAGCAAATTACCATGGCACACATTTACCTGTATAACAAACCTG
NOTE 7th duplicate, at fasta input record 30934: [ENSG00000198223,
ENSG00000198223.3]. gene identifier 'ENSG00000198223' previously found
at fasta input record 8798 which has these geneIds: [ENSG00000198223,
ENSG00000198223.3].  Do the sequences match? true Partial old
sequence:
TCCTGCAGGAATGGGGAGGCTAAGACGGTAGAGGTGCAGCCTGGTCAGCCATCTTTCACCTTTGCTGATGTTGCTATCCAGGTGTTTTCCATTGCATGTG Partial new sequence: TCCTGCAGGAATGGGGAGGCTAAGACGGTAGAGGTGCAGCCTGGTCAGCCATCTTTCACCTTTGCTGATGTTGCTATCCAGGTGTTTTCCATTGCATGTG
NOTE 8th duplicate, at fasta input record 30968: [ENSG00000205755,
ENSG00000205755.1]. gene identifier 'ENSG00000205755' previously found
at fasta input record 9187 which has these geneIds: [ENSG00000205755,
ENSG00000205755.1].  Do the sequences match? true Partial old
sequence:
GACGGAGTCTTGCTCTTGTCGCCCAGGCTGGAGTGCCGTGGCACGATCTCAGCTCACTGCCAACTCCGCCTCCCGGGTTCACGCCATTCTCCTGCCTCAG Partial new sequence: GACGGAGTCTTGCTCTTGTCGCCCAGGCTGGAGTGCCGTGGCACGATCTCAGCTCACTGCCAACTCCGCCTCCCGGGTTCACGCCATTCTCCTGCCTCAG
NOTE 9th duplicate, at fasta input record 31013: [ENSG00000196433,
ENSG00000196433.2]. gene identifier 'ENSG00000196433' previously found
at fasta input record 9741 which has these geneIds: [ENSG00000196433,
ENSG00000196433.2].  Do the sequences match? true Partial old
sequence:
GCCAATATAGTGAAACCCTGTCTCTACGAAAAATACAAAAATTAGCCAGGTATGGTGGCAGGTGCTTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGAA Partial new sequence: GCCAATATAGTGAAACCCTGTCTCTACGAAAAATACAAAAATTAGCCAGGTATGGTGGCAGGTGCTTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGAA
NOTE 10th duplicate, at fasta input record 31022: [ENSG00000168939,
ENSG00000168939.2]. gene identifier 'ENSG00000168939' previously found
at fasta input record 20849 which has these geneIds: [ENSG00000168939,
ENSG00000168939.2].  Do the sequences match? true Partial old
sequence:
GAGACAGCCTGAGTCAGCCTGAGTTAAAATCCTAGATCTGCAAACTGCCAACTGTGTAACCTTGGACAAGTTACTTAAGGTCTTTGGACCTTGGTTTCTC Partial new sequence: GAGACAGCCTGAGTCAGCCTGAGTTAAAATCCTAGATCTGCAAACTGCCAACTGTGTAACCTTGGACAAGTTACTTAAGGTCTTTGGACCTTGGTTTCTC
NOTE 11th duplicate, at fasta input record 31055: [ENSG00000169100,
ENSG00000169100.3]. gene identifier 'ENSG00000169100' previously found
at fasta input record 7624 which has these geneIds: [ENSG00000169100,
ENSG00000169100.3].  Do the sequences match? true Partial old
sequence:
AGCCAGCCTCATCTGGAAATAGCAGCTCTGGTCCCGGCCTCGCTGAGGCACTGAAAACCAGCACCAGGGCCCCGTCCAGCCCGGCCTCGCTGAGGCTGGG Partial new sequence: AGCCAGCCTCATCTGGAAATAGCAGCTCTGGTCCCGGCCTCGCTGAGGCACTGAAAACCAGCACCAGGGCCCCGTCCAGCCCGGCCTCGCTGAGGCTGGG
NOTE 12th duplicate, at fasta input record 31115: [ENSG00000185291,
ENSG00000185291.3]. gene identifier 'ENSG00000185291' previously found
at fasta input record 8154 which has these geneIds: [ENSG00000185291,
ENSG00000185291.3].  Do the sequences match? true Partial old
sequence:
AGGCTGGTCTTGAACCCCTGACCTCAGGTGATGCACCCACCTTGGCCTCCCACAGAGCTGGGATTACAGGCGTGAGCCACTGGGCCCCGCCCTGTATTTG Partial new sequence: AGGCTGGTCTTGAACCCCTGACCTCAGGTGATGCACCCACCTTGGCCTCCCACAGAGCTGGGATTACAGGCGTGAGCCACTGGGCCCCGCCCTGTATTTG
NOTE 13th duplicate, at fasta input record 31130: [ENSG00000124334,
ENSG00000124334.6]. gene identifier 'ENSG00000124334' previously found
at fasta input record 19934 which has these geneIds: [ENSG00000124334,
ENSG00000124334.6].  Do the sequences match? true Partial old
sequence:
CTTTTCTCTTAAGCATGGGTGACATAGTACTCTTTCTTCATGTGTTTGATAAATTTGTTTTTATCTTAGAAATTGTGAATGGTATACATTGTTGAGACTG Partial new sequence: CTTTTCTCTTAAGCATGGGTGACATAGTACTCTTTCTTCATGTGTTTGATAAATTTGTTTTTATCTTAGAAATTGTGAATGGTATACATTGTTGAGACTG
NOTE 14th duplicate, at fasta input record 31198: [ENSG00000169084,
ENSG00000169084.3]. gene identifier 'ENSG00000169084' previously found
at fasta input record 9163 which has these geneIds: [ENSG00000169084,
ENSG00000169084.3].  Do the sequences match? true Partial old
sequence:
ATTACCTGAGGTCAGGAGTTTGAGACCAGCCAGGCCAACATGGTGAAATCCCATCTCTATTAAAAATACGAAAATTATTTGGGTGTGCTGGTGCATGCCT Partial new sequence: ATTACCTGAGGTCAGGAGTTTGAGACCAGCCAGGCCAACATGGTGAAATCCCATCTCTATTAAAAATACGAAAATTATTTGGGTGTGCTGGTGCATGCCT
NOTE 15th duplicate, at fasta input record 31327: [ENSG00000182484,
ENSG00000182484.4]. gene identifier 'ENSG00000182484' previously found
at fasta input record 19614 which has these geneIds: [ENSG00000182484,
ENSG00000182484.4].  Do the sequences match? true Partial old
sequence:
ATGCATTCAGAAAACTTTAGATCACGGTTGAGAAGAATCAAAAATATTAAATCAAATGCAGATACTCCTTGTTTAGGAGCAGTACACTCATTATTGTTAG Partial new sequence: ATGCATTCAGAAAACTTTAGATCACGGTTGAGAAGAATCAAAAATATTAAATCAAATGCAGATACTCCTTGTTTAGGAGCAGTACACTCATTATTGTTAG
NOTE 16th duplicate, at fasta input record 31342: [ENSG00000002586,
ENSG00000002586.7]. gene identifier 'ENSG00000002586' previously found
at fasta input record 8086 which has these geneIds: [ENSG00000002586,
ENSG00000002586.7].  Do the sequences match? true Partial old
sequence:
AGCCTGTACCCCAGAACTTAAAGTATAATAATAACAATAATAAAAAGACAGGTGTTATCTCAGAGCCCCTGACTCAGTCGGCTGGGCAGCAAGTATGCCA Partial new sequence: AGCCTGTACCCCAGAACTTAAAGTATAATAATAACAATAATAAAAAGACAGGTGTTATCTCAGAGCCCCTGACTCAGTCGGCTGGGCAGCAAGTATGCCA
NOTE 17th duplicate, at fasta input record 31373: [ENSG00000182378,
ENSG00000182378.3]. gene identifier 'ENSG00000182378' previously found
at fasta input record 8467 which has these geneIds: [ENSG00000182378,
ENSG00000182378.3].  Do the sequences match? true Partial old
sequence:
GACCACAGTCCACATCACACCAGGACACGGAGGAAGGGCCAGGCCTCATGACCACAGTCCAGATCACACCAGGACACAGAGGAAGGGCCGGGCCCTGTGA Partial new sequence: GACCACAGTCCACATCACACCAGGACACGGAGGAAGGGCCAGGCCTCATGACCACAGTCCAGATCACACCAGGACACAGAGGAAGGGCCGGGCCCTGTGA
NOTE 18th duplicate, at fasta input record 31428: [ENSG00000169093,
ENSG00000169093.5]. gene identifier 'ENSG00000169093' previously found
at fasta input record 9036 which has these geneIds: [ENSG00000169093,
ENSG00000169093.5].  Do the sequences match? true Partial old
sequence:
TATTCCTTGATTTCAGATGTCTGGGCTCCAGAGCTGTAATACAATTAAGTTTTGCTGTTTTAAGCCCCAGGGTTTTGAGTGACAGTTACCAGCAACCCCC Partial new sequence: TATTCCTTGATTTCAGATGTCTGGGCTCCAGAGCTGTAATACAATTAAGTTTTGCTGTTTTAAGCCCCAGGGTTTTGAGTGACAGTTACCAGCAACCCCC
NOTE 19th duplicate, at fasta input record 31442: [ENSG00000167393,
ENSG00000167393.7]. gene identifier 'ENSG00000167393' previously found
at fasta input record 9170 which has these geneIds: [ENSG00000167393,
ENSG00000167393.7].  Do the sequences match? true Partial old
sequence:
CCCAGCAAACTCTGCAACACCTCAGGCCCTGCCAGCCTTGGGGGCCCGACAGCACCTCTTTGTTCTCCCAGAGCAAAGCCTGCACGGAGTGGGCCCCCGG Partial new sequence: CCCAGCAAACTCTGCAACACCTCAGGCCCTGCCAGCCTTGGGGGCCCGACAGCACCTCTTTGTTCTCCCAGAGCAAAGCCTGCACGGAGTGGGCCCCCGG
NOTE 20th duplicate, at fasta input record 31485: [ENSG00000178605,
ENSG00000178605.4]. gene identifier 'ENSG00000178605' previously found
at fasta input record 9699 which has these geneIds: [ENSG00000178605,
ENSG00000178605.4].  Do the sequences match? true Partial old
sequence:
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN Partial new sequence: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NOTE 21th duplicate, at fasta input record 31508: [ENSG00000169098,
ENSG00000169098.5]. gene identifier 'ENSG00000169098' previously found
at fasta input record 9900 which has these geneIds: [ENSG00000169098,
ENSG00000169098.5].  Do the sequences match? true Partial old
sequence:
GCCGGGCACGGTGGCTCACGCCTGCAATGCCAGCACTTTAGGAGGCCGAGGTGGGCAGATCACCTGAGGTCAGGAGTTCGAGACCAGCCTGGCCAACATG Partial new sequence: GCCGGGCACGGTGGCTCACGCCTGCAATGCCAGCACTTTAGGAGGCCGAGGTGGGCAGATCACCTGAGGTCAGGAGTTCGAGACCAGCCTGGCCAACATG


For those interested the results should still be available for a
little while: 
http://www.biomart.org/biomart/martresults?file=martquery_0523154530_544.txt.gz

Ideas?

Thanks,

Peter Andrews


-- 
--------------
Peter Andrews
Computational Genetics Lab
Dartmouth Hitchcock Medical Center
(603) 653-3598
    

-- 
--------------
Peter Andrews
Computational Genetics Lab
Dartmouth Hitchcock Medical Center
(603) 653-3598

Reply via email to