Was looking on the internet ... So the Java spec says nothing about an upper limit however the sun JDK implements String as a char[] (behind the scenes). Therefore I think that on the Sun JDK with the right amount of RAM you could go to 2^32 (except for string literals as mentioned above) which is 4,294,967,296 characters. So a string of a sequence should be able to get to about 4 billion bases.
Of course if you don't assign enough memory to the JVM ( -Xmx4G) you won't be able to get close. Of course even if you can assign that much that doesn't account for all the other Java overhead and all the stuff Hibernate is doing with proxy classes etc. Also BioSQL usually defines sequence as a CLOB so depending on your DB implementation there may be a limit on that. On a 32 bit machine 4GB is all you can get per CPU so you would have issues trying to do anything bigger. Anyhow I know I have stored human chromosome 1 (approx 1 billion bases in memory). - Mark On Fri, Jul 18, 2008 at 6:45 PM, James Carman <[EMAIL PROTECTED]> wrote: > That is a limitation for string literals, not any string. Correct? > > On Fri, Jul 18, 2008 at 4:47 AM, Richard Holland > <[EMAIL PROTECTED]> wrote: >> In order to persist to BioSQL, BioJava has to convert the symbol list >> into a string so that it can pass it to JDBC via Hibernate. Therefore >> the maximum length of a sequence you wish to persist to BioSQL is the >> maximum length of a string in Java, which is 65536 (2^16) if you are >> working in a UTF-8 environment. >> >> 2008/7/18 Rey Vincent Babilonia <[EMAIL PROTECTED]>: >>> Hi Mark, >>> >>> What is the maximum sequence length that a RichSequence can handle? >>> >>> java -Xms1024m -Xmx1256m -jar loader.jar >>> . >>> 16:09:00,173 INFO Loader:296 - D:\AE005174.gbk is readable. >>> 16:09:06,704 INFO Loader:326 - Loading sequence AE005174 with identifier >>> 56384585, length 5528445 and alphabet DNA... >>> org.hibernate.PropertyAccessException: Exception occurred inside getter of >>> org.biojavax.bio.seq.SimpleRichSequence.sequenceLength >>> >>> Rey Vincent Babilonia wrote: >>>> >>>> Hi Mark, >>>> >>>> At first it throws an out of memory exception. My workaround is to >>>> subdivide the sequence file into individual GenBank files. >>>> >>>> The error now is that if a GenBank sequence has an 'empty alphabet', it >>>> does not get loaded to BioSQL. My workaround is to check if >>>> sequence.getAlphabet().getName() is DNA. >>>> >>>> Thanks. >>>> >>>> Mark Schreiber wrote: >>>>> >>>>> Hi - >>>>> >>>>> Is the code throwing an exception or running out of memory?? >>>>> >>>>> Can you send an example program and the problem you encounter to the >>>>> list. >>>>> - Mark >>>>> >>>>> On Thu, May 29, 2008 at 9:53 AM, Rey Vincent Babilonia >>>>> <[EMAIL PROTECTED]> wrote: >>>>>> >>>>>> -------- Original Message -------- >>>>>> Subject: large genbank data >>>>>> Date: Wed, 28 May 2008 18:02:48 +0800 >>>>>> From: Rey Vincent Babilonia <[EMAIL PROTECTED]> >>>>>> To: [EMAIL PROTECTED] >>>>>> >>>>>> hi, >>>>>> >>>>>> anybody tried uploading a large genbank data (e.g. >>>>>> ftp://bio-mirror.net/biomirror/genbank/gbbct1.seq.gz) to biosql? >>>>>> load_seqdatabase.pl of bioperl can do this. i'm switching to biojava and >>>>>> it can't read the sequence (maybe because it has 30000+ sequences). >>>>>> >>>>>> thanks. >>>>>> >>>>>> -- >>>>>> /** >>>>>> * @author Rey Vincent P. Babilonia >>>>>> * @number +63 2 426 9760 local 1302 >>>>>> * @pgp 0x383454CF <at> pgp.mit.edu >>>>>> * @project Philippine Bioinformatics Solutions >>>>>> * @program Philippine e-Science Grid >>>>>> * @division Research and Development Division >>>>>> * @agency Advanced Science and Technology Institute >>>>>> * @url http://www.psigrid.gov.ph >>>>>> */ >>>>>> >>>>>> >>>>>> -- >>>>>> /** >>>>>> * @author Rey Vincent P. Babilonia >>>>>> * @number +63 2 426 9760 local 1302 >>>>>> * @pgp 0x383454CF <at> pgp.mit.edu >>>>>> * @project Philippine Bioinformatics Solutions >>>>>> * @program Philippine e-Science Grid >>>>>> * @division Research and Development Division >>>>>> * @agency Advanced Science and Technology Institute >>>>>> * @url http://www.psigrid.gov.ph >>>>>> */ >>>>>> >>>>>> No virus found in this outgoing message. >>>>>> Checked by AVG. >>>>>> Version: 8.0.100 / Virus Database: 269.24.2/1471 - Release Date: >>>>>> 5/28/2008 5:33 PM >>>>>> >>>>>> _______________________________________________ >>>>>> biojava-dev mailing list >>>>>> [EMAIL PROTECTED] >>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>>> >>>>> >>>> >>> >>> -- >>> /** >>> * @author Rey Vincent P. Babilonia >>> * @number +63 2 426 9760 local 1302 >>> * @pgp 0x383454CF <at> pgp.mit.edu >>> * @project Philippine Bioinformatics Solutions >>> * @program Philippine e-Science Grid >>> * @division Research and Development Division >>> * @agency Advanced Science and Technology Institute >>> * @url http://www.psigrid.gov.ph >>> */ >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> [EMAIL PROTECTED] >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >> _______________________________________________ >> Biojava-l mailing list - [email protected] >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > _______________________________________________ > Biojava-l mailing list - [email protected] > http://lists.open-bio.org/mailman/listinfo/biojava-l > _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
