Hi all,

I have a BioSQL database which contains all human chromsomes. For my recent project I have to query for a part of a sequence. As far as I know I can get the whole sequence from the entry Biosequence.Seq in the BioSQL schema. So I've made this query:

SELECT SUBSTRING(bs.seq, 131615042, 131626262) FROM biosequence bs;

But this query hasn't yield the desired string, because the length of this biosequence is only 100,000,020 bp. I am very confused why I get such a discrepancy. I have added all chromosomes with the build in method in BioJava addRichSequence(RichSequence seq) to the database. From my raw data I know that this sequence should have a length of 140,279,252 bp. So where is the remaining part of my sequence? I have observed these discrepancies on all chromsomes which are longer than 100,000,020 bp.

Here is an abstract of my database:
bioentry_id     description     length  
2       Homo sapiens mitochondrion, complete genome.    16571   
3 Homo sapiens chromosome Y, reference assembly, complete sequence. 57772954 4 Homo sapiens chromosome X, reference assembly, complete sequence. 100000020 5 Homo sapiens chromosome 22, reference assembly, complete sequence. 49691432 6 Homo sapiens chromosome 21, reference assembly, complete sequence. 46944323 7 Homo sapiens chromosome 20, reference assembly, complete sequence. 25960004 8 Homo sapiens chromosome 9, reference assembly, complete sequence. 100000020 9 Homo sapiens chromosome 7, reference assembly, complete sequence. 100000020

Sequences smaller than 100,000,020 bp are correctly stored under Biosequence.seq.

I am grateful for any hints, which explain the behaviour of my database.

Cheers,

Gabrielle
_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

Reply via email to