On Wed, Feb 20, 2002 at 01:52:10PM -0500, Marc Colosimo wrote: > Hi, > > Is there any information about using the BioSQL classes in BioJava, such > as the schema for the database or examples in using it? I am interest in > using postgre and biojava to store lots of sequence data.
BioSQL is based on bioperl-db. There's a little bit about it in the document from the first (O'Reilly) hackathon meeting: http://www.technophage.com/open-bio-database.pdf The BioJava code's quite new -- I've got a little tutorial planned, but I'm afraid (ahem) it's not written yet. In the mean time, the code is integrated into the main trunk version of biojava-live (although it didn't quite make it into 1.2), and hopefully shouldn't be too problematic to use (touch wood!). You can get schemas (MySQL and PostgreSQL) from: http://www.biojava.org/download/biosql/ Right now, there are actually two PostgreSQL schemas -- one was auto-generated from the MySQL one, the other was hand edited by me (identified by the -thomasd suffix). Right now, I'd advise the hand-edited version, but this should go away in future once the automated conversion has been perfected. If you're using PostgreSQL, note the following: - You need at least version 7.1 -- previous versions didn't support storing large strings in normal table attributes. - There's a file of stored procedures (biosqlprocs.sql) which you can load into the database after loading the schema. These are auto-detected by the BioJava code, and can increase write performance by a significant amount (a factor of 3, using my test setup). On the BioJava side, there isn't really any API for BioSQL as such. You can just do something like: SequenceDB seqs = new BioSQLSequenceDB( "jdbc:postgresql://dbbox.mydomain.org/biosql_db", "username", "password", "database-name", true ); The first three arguments are just standard JDBC-style database connection details. There's a `database name' parameter because BioSQL allows each `physical' SQL database to contain a number of `logical' databases. Perhaps namespace would be a better term for these (but hey, I didn't write the original schema). The final argument specifies whether the namespace should be created if it doesn't already exist. Note that right now, the BioJava code won't create the actual SQL database, or load the schema, for you. You'll have to do this manally using your database's normal tools. Having connected to the database, you can write complete Sequence entries using the addSequence(Sequence) method. You can retreive sequences by ID using the getSequence(String) method. Objects extracted by this method retain live connections to the database. Alterations to the sequence (for instance, using the createFeature(Feature.Template) method) are immediately reflected in the database (in a transactionally safe manner, if the database supports this -- PostgreSQL does). So they're true persistant implementations of the BioJava interfaces. The aim is to have everything work just like in-memory SequenceDB, Sequence, and Feature objects. For many purposes, BioSQL is now pretty close to this ideal. Basic BioSQL doesn't support hierarchical features, so theseg get flattened when adding a sequence to a database (and attempts to create new child features on a BioSQL sequence will fail). However, I've got an /experimental/ extension for handling this. There's an extra table (seqfeature_hierarchy) in my schema. Once again, this is autodetected by the client code and used if available. Let me know how you get on, Thomas. _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l