On 5 May 2007, at 06:45, Aristotelis Tsirigos wrote:
I tried the following two queries using wget, one to get 5'UTR
sequences and coordinates and the other to get the 3'UTR. In the two
result files however, the order of the attributes is not the same. Is
there any way to control the order, or do attributes appear in random
order?
Hi Aristotelis,
I can't reproduce this problem. The order in both files appears the
same to me:
Here are the queries (in this case,
DATASET=dmelanogaster_gene_ensembl):
wget -q 'http://www.biomart.org/biomart/martservice?query=<?xml
version="1.0" encoding="UTF-8"?> <Query virtualSchemaName = "default"
header = "1" count = "" softwareVersion = " 0.5" > <Dataset name =
"'$DATASET'" interface = "default" > <Attribute name =
"gene_stable_id" /> <Attribute name = "5utr" /> <Attribute name =
"transcript_stable_id" /> <Attribute name = "str_chrom_name" />
<Attribute name = "transcript_chrom_strand" /> <Attribute name =
"5utr_start" /> <Attribute name = "5utr_end" /> </Dataset> </Query>'
-O 5utr.dat
5' UTR Ensembl Gene ID Ensembl Transcript ID Chromosome Strand
5 UTR Start (Chr bp) 5 UTR End (Chr bp)
Sequence unavailable CG2657 CG2657-RA 2L -1
CTACTCGCATGTAGAGATTTCCACTTATGTTTTCTCTACTTTCAGCAACCGAGAAGAGAACCCACGTTTGAA
CAAGTATCGGCGTGTGGACAACAGCTATCCCCGCTTCATAACGAATGAGGCTGCCGAGGACCTGATTTACAA
GAAGTCC CG11023 CG11023-RA 2L 1 7529 7679
CGCAGTTGAACGCAGGTTGAGCAGGAAGCTAGTCGAGACTATAATCCATATCTTGTCTGATCCTTTGTTCAA
AACCACACTCCACCAACAATTTAGCCGACCGGAACTCGGGTTATAGCACTGCTCCCCCATTGCCCCTTCAAA
CTTCGAGTTACATATTACAAACTACCCATCAAC CG2674 CG2674-RB
2L 1 107760;108588 107838;108685
wget -q 'http://www.biomart.org/biomart/martservice?query=<?xml
version="1.0" encoding="UTF-8"?> <Query virtualSchemaName = "default"
header = "1" count = "" softwareVersion = " 0.5" > <Dataset name =
"'$DATASET'" interface = "default" > <Attribute name =
"gene_stable_id" /> <Attribute name = "3utr" /> <Attribute name =
"transcript_stable_id" /> <Attribute name = "str_chrom_name" />
<Attribute name = "transcript_chrom_strand" /> <Attribute name =
"3utr_start" /> <Attribute name = "3utr_end" /> </Dataset> </Query>'
-O 3utr.dat
3' UTR Ensembl Gene ID Ensembl Transcript ID Chromosome Strand
3 UTR Start (Chr bp) 5 UTR End (Chr bp)
Sequence unavailable CG2657 CG2657-RA 2L -1
CAGTAGAATCACACAGCTACGCAAGAATGTGGAGAATCCAGTTTAGTTATTTTTACAAATCTTACGTAAACA
CTCCAAGCATGAATTCGCAACAAGTGCTTAGCTATTTAATTGAATTGAGCTGGCCGAGAGATGTGCTGGTGC
AATAACTTGTTCTCATATCTGATTGTAACAGAGAATCTAGTTTTTCAATAAAATTTCCCC
AAGTAAAAACA CG11023 CG11023-RA 2L 1 9277 7679
and that is:
sequence_type(3utr,5utr), gene_stable_id,
transcript_stable_id,str_chrom_name,transcript_chrom_strand,3(5)utr_star
t,3(5)utr_end
The order of attributes is determined by the order you specify them in
your query with only one exception
and that is the sequence type which takes the precedence over other
attributes - this is probably what confuses the issue (the actual
order is actually reflected by the display names of your attributes in
the header of your file).
hope that helps,
a.
------------------------------------------------------------------------
-------
Arek Kasprzyk
EMBL-European Bioinformatics Institute.
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
Tel: +44-(0)1223-494606
Fax: +44-(0)1223-494468
------------------------------------------------------------------------
-------