In order to persist to BioSQL, BioJava has to convert the symbol list into a string so that it can pass it to JDBC via Hibernate. Therefore the maximum length of a sequence you wish to persist to BioSQL is the maximum length of a string in Java, which is 65536 (2^16) if you are working in a UTF-8 environment.
2008/7/18 Rey Vincent Babilonia <[EMAIL PROTECTED]>: > Hi Mark, > > What is the maximum sequence length that a RichSequence can handle? > > java -Xms1024m -Xmx1256m -jar loader.jar > . > 16:09:00,173 INFO Loader:296 - D:\AE005174.gbk is readable. > 16:09:06,704 INFO Loader:326 - Loading sequence AE005174 with identifier > 56384585, length 5528445 and alphabet DNA... > org.hibernate.PropertyAccessException: Exception occurred inside getter of > org.biojavax.bio.seq.SimpleRichSequence.sequenceLength > > Rey Vincent Babilonia wrote: >> >> Hi Mark, >> >> At first it throws an out of memory exception. My workaround is to >> subdivide the sequence file into individual GenBank files. >> >> The error now is that if a GenBank sequence has an 'empty alphabet', it >> does not get loaded to BioSQL. My workaround is to check if >> sequence.getAlphabet().getName() is DNA. >> >> Thanks. >> >> Mark Schreiber wrote: >>> >>> Hi - >>> >>> Is the code throwing an exception or running out of memory?? >>> >>> Can you send an example program and the problem you encounter to the >>> list. >>> - Mark >>> >>> On Thu, May 29, 2008 at 9:53 AM, Rey Vincent Babilonia >>> <[EMAIL PROTECTED]> wrote: >>>> >>>> -------- Original Message -------- >>>> Subject: large genbank data >>>> Date: Wed, 28 May 2008 18:02:48 +0800 >>>> From: Rey Vincent Babilonia <[EMAIL PROTECTED]> >>>> To: [EMAIL PROTECTED] >>>> >>>> hi, >>>> >>>> anybody tried uploading a large genbank data (e.g. >>>> ftp://bio-mirror.net/biomirror/genbank/gbbct1.seq.gz) to biosql? >>>> load_seqdatabase.pl of bioperl can do this. i'm switching to biojava and >>>> it can't read the sequence (maybe because it has 30000+ sequences). >>>> >>>> thanks. >>>> >>>> -- >>>> /** >>>> * @author Rey Vincent P. Babilonia >>>> * @number +63 2 426 9760 local 1302 >>>> * @pgp 0x383454CF <at> pgp.mit.edu >>>> * @project Philippine Bioinformatics Solutions >>>> * @program Philippine e-Science Grid >>>> * @division Research and Development Division >>>> * @agency Advanced Science and Technology Institute >>>> * @url http://www.psigrid.gov.ph >>>> */ >>>> >>>> >>>> -- >>>> /** >>>> * @author Rey Vincent P. Babilonia >>>> * @number +63 2 426 9760 local 1302 >>>> * @pgp 0x383454CF <at> pgp.mit.edu >>>> * @project Philippine Bioinformatics Solutions >>>> * @program Philippine e-Science Grid >>>> * @division Research and Development Division >>>> * @agency Advanced Science and Technology Institute >>>> * @url http://www.psigrid.gov.ph >>>> */ >>>> >>>> No virus found in this outgoing message. >>>> Checked by AVG. >>>> Version: 8.0.100 / Virus Database: 269.24.2/1471 - Release Date: >>>> 5/28/2008 5:33 PM >>>> >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> [EMAIL PROTECTED] >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>> >> > > -- > /** > * @author Rey Vincent P. Babilonia > * @number +63 2 426 9760 local 1302 > * @pgp 0x383454CF <at> pgp.mit.edu > * @project Philippine Bioinformatics Solutions > * @program Philippine e-Science Grid > * @division Research and Development Division > * @agency Advanced Science and Technology Institute > * @url http://www.psigrid.gov.ph > */ > > _______________________________________________ > biojava-dev mailing list > [EMAIL PROTECTED] > http://lists.open-bio.org/mailman/listinfo/biojava-dev > _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
