Hello. Your code is pretty good already - but you're right, it will load the whole chromosome into memory before you can chop out the interesting bit you actually need.
As you observed, by using ThinRichSequence in your query it will load only the initial shell of a sequence object to start with, but the moment you try and sub-sequence it, it will immediately load the whole sequence data into memory in order to perform the operation. If you only want the sequence data, as a string, you can do this by specifying the sequence attribute in the query and bypassing the sequence object entirely: select rs.stringSequence from Sequence as rs where rs.description like '%hromosome :num% This will return a String instead of a RichSequence object. You can use HQL operators to perform substrings etc. on the string inside the query itself - see http://docs.huihoo.com/hibernate/hibernate-reference-3.2.1/queryhql.html , particularly section 14.9. If you only want the features, you can do this by using the BioSQLFeatureFilter technique. In particular you will want the BySequenceName filter, the And filter, and the OverlapsRichLocation filter. You construct a filter then pass it to the filter() method in BioSQLRichSequenceDB. The database will return to you all the RichFeature objects that match your criteria. Note that it searches the whole database so you really must use a BySequenceName filter at the very least in order to make the results useful! However, you can't use HQL to construct a complete slice of a sequence directly in the database before returning it to the program for use as a ready-made RichSequence object. This would require Hibernate to know what a BioJava sub-sequence object is and how it behaves in relation to an 'unsliced' one, which is beyond the scope of it's job as a persistence framework. cheers, Richard 2008/10/7 Gabrielle Doan <[EMAIL PROTECTED]>: > Hi all, > I have a BioSQL database which contains all human chromosomes. My intention > is to get the information about a particular gene. How can I get a part of a > particular chromosome with all associated features? At the moment I use > following code to create my new sequence: > > <code> > RichSequence subSeq = RichSequence.Tools.subSequence(parent, > position[0], position[1], ns, geneName, parent.getAccession(), > parent.getIdentifier(), parent.getVersion() + 1, > (Double) (parent.getVersion() + 1.0)); > <\code> > > Here is the part how I get the parent sequence: > <code> > public static RichSequence getChromosome(String chrNo) { > Transaction tx = session.beginTransaction(); > RichSequence ret = null; > > String query; > > try { > if (chrNo.equals("MT")) { > query = "from BioEntry as be where > be.description like '%:num%'"; > query = query.replaceAll(":num", > "mitochondrion"); > } else { > query = "from BioEntry as be where > be.description like '%hromosome :num%'"; > query = query.replaceAll(":num", chrNo); > } > > Query q = session.createQuery(query); > > ret = (RichSequence) q.list().get(0); > tx.commit(); > } catch (Exception e) { > tx.rollback(); > e.printStackTrace(); > } > return ret; > } > <\code> > > I always have to load the whole chromsome to get a part of it, so it takes > very long time and I get a lot of unused information (waste of memory). I > also tried to use <code>ThinRichSequence<\code> instead of > <code>RichSequence<\code>, but thereby I didn't notice any difference. > Can you give me a hint how to accelerate the code? > I am grateful for any hits. > > cheers, > Gabrielle > _______________________________________________ > Biojava-l mailing list - [email protected] > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: [EMAIL PROTECTED] http://www.eaglegenomics.com/ _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
