I'm not at all dependent on EMBL files. I only need a sequence with some annotation and some features. Any format that is parsable with BioJava would do. What alternatives are currently at hand?
Stein.
Arne Stabenau wrote:
Stein Aerts wrote:
OK, then we will wait until monday.
I am indeed considering to use ensj. Would it be possible to inform me on how to construct a EMBL formatted flat file of a gene (with some features of choice) using ensj? I couldn't find that in the documentation.
Ensj does not support EMBL flat file dumps and I seriously consider not to do it in the future for EnsEMBL. Its not well documented format and very complicated occasionally. Is there a reason why you depend so much on EMBL files?
I would rather provide alternatives.
Arne
Regards and thanks a lot, Stein. Arne Stabenau wrote:Hi Stein,
The EMBL export function on the current website used to work when we released the site. For some reason the mistakes you spotted got introduced. We tested the new release website which will come out next monday and it doesnt seem to have the problem (yet). So I would like to take the easy route for us and wait for the next release. We will however be careful not to reinvent the bug on that one.
If there is any pressing reason for a fix earlier than that, please let us know. Please consider to use ensj for what you want to do, its as fast as the perl code for most of the stuff it does. It just doesnt give you biojava objects.
Arne
Stein Aerts wrote:
The BioJava-Ensembl should be ideal. However, retrieving a gene with flanking sequence based on gene_stable_id using the code below takes a million years.
Ensembl ens = new Ensembl(
org.ensembl.db.sql.SQLDatabaseAdaptor.connectSQL(dbURL, dbUser, dbPass, dbSchemaVersion)
);
SequenceDB chromos = ens.getChromosomes();
FeatureHolder transHolder = chromos.filter(
new FeatureFilter.ByAnnotation("ensembl.gene", "ENSG00000167779")
);
The output gives:
Querying: where contig_id = '592075'
Querying: where contig_id = '162233'
Querying: where contig_id = '162238'
Querying: where contig_id = '162241'
etc.
So that is not very efficient.
Would there an alternative here that is similar to the export data function (based on any feature: gene, contig, clone, cDNA, peptide...) which runs via HTTP and is very very fast.
If you want to see fast, construct URLs for the Mart and extract the data you want from the result ...
Stein. Thomas Down wrote:On Wed, Jan 29, 2003 at 09:58:18AM +0000, Ewan Birney wrote:
(c) If you don't like Perl ( ... this is the biojava mailing list...) then there is a pretty complete and stable Java binding to Ensembl - it doesn't use BioJava - it is more just a vanilla Java binding to Ensembl. Craig melsopp is the lead on that. The web page is
http://www.ensembl.org/java/
(d) There's also a completely different BioJava-based mechanism
for accessing Ensembl databases:
http://biojava.org/pipermail/biojava-l/2002-December/003418.html
Unlike ensj, this is 100% read-only. It does give you access
without an additional API, though, and as far as I know it's the
only thing which supports multiple versions of the Ensembl database
schema off a single codebase.
Thomas.
-- Stein Aerts BioI@SISTA K.U.Leuven ESAT-SCD Belgium http://www.esat.kuleuven.ac.be/~dna/BioI
-- Stein Aerts BioI@SISTA K.U.Leuven ESAT-SCD Belgium http://www.esat.kuleuven.ac.be/~dna/BioI _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l