On 5 May 2007, at 06:45, Aristotelis Tsirigos wrote:

I tried the following two queries using wget, one to get 5'UTR sequences and coordinates and the other to get the 3'UTR. In the two result files however, the order of the attributes is not the same. Is there any way to control the order, or do attributes appear in random order?


Hi Aristotelis,
I can't reproduce this problem. The order in both files appears the same to me:


Here are the queries (in this case, DATASET=dmelanogaster_gene_ensembl):


wget -q 'http://www.biomart.org/biomart/martservice?query=<?xml version="1.0" encoding="UTF-8"?> <Query virtualSchemaName = "default" header = "1" count = "" softwareVersion = " 0.5" > <Dataset name = "'$DATASET'" interface = "default" > <Attribute name = "gene_stable_id" /> <Attribute name = "5utr" /> <Attribute name = "transcript_stable_id" /> <Attribute name = "str_chrom_name" /> <Attribute name = "transcript_chrom_strand" /> <Attribute name = "5utr_start" /> <Attribute name = "5utr_end" /> </Dataset> </Query>' -O 5utr.dat




5' UTR Ensembl Gene ID Ensembl Transcript ID Chromosome Strand 5 UTR Start (Chr bp) 5 UTR End (Chr bp)
Sequence unavailable    CG2657  CG2657-RA       2L      -1
CTACTCGCATGTAGAGATTTCCACTTATGTTTTCTCTACTTTCAGCAACCGAGAAGAGAACCCACGTTTGAA CAAGTATCGGCGTGTGGACAACAGCTATCCCCGCTTCATAACGAATGAGGCTGCCGAGGACCTGATTTACAA GAAGTCC CG11023 CG11023-RA 2L 1 7529 7679 CGCAGTTGAACGCAGGTTGAGCAGGAAGCTAGTCGAGACTATAATCCATATCTTGTCTGATCCTTTGTTCAA AACCACACTCCACCAACAATTTAGCCGACCGGAACTCGGGTTATAGCACTGCTCCCCCATTGCCCCTTCAAA CTTCGAGTTACATATTACAAACTACCCATCAAC CG2674 CG2674-RB
        2L      1       107760;108588   107838;108685




wget -q 'http://www.biomart.org/biomart/martservice?query=<?xml version="1.0" encoding="UTF-8"?> <Query virtualSchemaName = "default" header = "1" count = "" softwareVersion = " 0.5" > <Dataset name = "'$DATASET'" interface = "default" > <Attribute name = "gene_stable_id" /> <Attribute name = "3utr" /> <Attribute name = "transcript_stable_id" /> <Attribute name = "str_chrom_name" /> <Attribute name = "transcript_chrom_strand" /> <Attribute name = "3utr_start" /> <Attribute name = "3utr_end" /> </Dataset> </Query>' -O 3utr.dat




3' UTR Ensembl Gene ID Ensembl Transcript ID Chromosome Strand 3 UTR Start (Chr bp) 5 UTR End (Chr bp)
Sequence unavailable    CG2657  CG2657-RA       2L      -1
CAGTAGAATCACACAGCTACGCAAGAATGTGGAGAATCCAGTTTAGTTATTTTTACAAATCTTACGTAAACA CTCCAAGCATGAATTCGCAACAAGTGCTTAGCTATTTAATTGAATTGAGCTGGCCGAGAGATGTGCTGGTGC AATAACTTGTTCTCATATCTGATTGTAACAGAGAATCTAGTTTTTCAATAAAATTTCCCC
AAGTAAAAACA     CG11023 CG11023-RA      2L      1       9277    7679


and that is:

sequence_type(3utr,5utr), gene_stable_id, transcript_stable_id,str_chrom_name,transcript_chrom_strand,3(5)utr_star t,3(5)utr_end

The order of attributes is determined by the order you specify them in your query with only one exception and that is the sequence type which takes the precedence over other attributes - this is probably what confuses the issue (the actual order is actually reflected by the display names of your attributes in the header of your file).

hope that helps,
a.



------------------------------------------------------------------------ -------
Arek Kasprzyk
EMBL-European Bioinformatics Institute.
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
Tel: +44-(0)1223-494606
Fax: +44-(0)1223-494468
------------------------------------------------------------------------ -------



Reply via email to