[Biojava-l] differences between read in sequence and stored sequence in database

Gabrielle Doan Mon, 27 Oct 2008 06:10:18 -0700

Hi all,

I have a BioSQL database which contains all human chromsomes. For myrecent project I have to query for a part of a sequence.As far as I know I can get the whole sequence from the entryBiosequence.Seq in the BioSQL schema. So I've made this query:


SELECT SUBSTRING(bs.seq, 131615042, 131626262) FROM biosequence bs;

But this query hasn't yield the desired string, because the length ofthis biosequence is only 100,000,020 bp. I am very confused why I getsuch a discrepancy. I have added all chromosomes with the build inmethod in BioJava addRichSequence(RichSequence seq) to the database.From my raw data I know that this sequence should have a length of140,279,252 bp. So where is the remaining part of my sequence? I haveobserved these discrepancies on all chromsomes which are longer than100,000,020 bp.


Here is an abstract of my database:
bioentry_id     description     length  
2       Homo sapiens mitochondrion, complete genome.    16571

3 Homo sapiens chromosome Y, reference assembly, complete sequence.577729544 Homo sapiens chromosome X, reference assembly, complete sequence.1000000205 Homo sapiens chromosome 22, reference assembly, complete sequence.496914326 Homo sapiens chromosome 21, reference assembly, complete sequence.469443237 Homo sapiens chromosome 20, reference assembly, complete sequence.259600048 Homo sapiens chromosome 9, reference assembly, complete sequence.1000000209 Homo sapiens chromosome 7, reference assembly, complete sequence.100000020

Sequences smaller than 100,000,020 bp are correctly stored underBiosequence.seq.


I am grateful for any hints, which explain the behaviour of my database.

Cheers,

Gabrielle
_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

[Biojava-l] differences between read in sequence and stored sequence in database

Reply via email to