Re: [Jalview-dev] dbSNP, GenBank, integration framework and i18n management

Jim Procter Mon, 13 Jan 2014 04:03:09 -0800

Hi David. I've cced this to the development list, so everyone can read 
about what you've been up to.

On Sat Jan 11 12:00:37 2014, David Roldán Martínez wrote:
> I've been working on this, taking a look at Stockholm and Embl
> processes but I cannot figure out which to do with the information,
> once I have loaded SNP file and GenBank information into my own
> classes hierarchy.
hmm. Just a comment here: you really should avoid setting up a class 
hierarchy if you can avoid it - the parsing overhead from creating lots 
of objects is quite considerable. Jalview has a quite extensive 
annotation datamodel, which will work for non-hierarchical sequence 
features - but not for more complex compound/hierarchical features 
(http://issues.jalview.org/browse/JAL-1191). For the full gory details 
of this type of annotation, you need to read the documentation here:
http://www.insdc.org/files/feature_table.html#2

however - don't start hacking on this until we talk - there are some 
very good examples of how to implement complex/compound feature 
datamodels, and I'd prefer it if we first analyse those and work out 
which one fits Jalview's needs best.

> _SNP loading
> I've been able to set up Castor Maven plugin so that I can generate a
> Java library only customizing a pom.xml and to include a XSD (or set
> of XSD). In this way, I think we'll be able to widen Jalview data load
> from multiple sources quite easily. I can work on that (just tell me
> the XSD) but I really need to fully understand Jalview datamodel. An
> E-R diagram or similar will be useful in this sense. ;-)

eek!  We already have the castor source generation machinery bundled 
with Jalview. By using a maven plugin, you risk breaking compatibility 
with the bundled version of castor, which is NOT good. If you must use 
castor XSD->Java, then take a look at the 'castorbinding' task in 
build.xml - this already includes a set of XSDs that create java 
bindings for the Jalview project and colourscheme files which are 
critical for the jalview desktop.

You should also bear in mind that currrently, fileformats dependent on 
classes autogenerated with castor will not be available in the applet, 
since XML parsing libraries are considered too heavyweight to ship to 
the browser. This is the most significant reason for not using 
XSD->Java object mapping, but there are other reasons for avoiding it: 
e.g. when working with large XML files, stream XML processing avoids 
the memory and object creation overhead incurred by creating an object 
representation of elements in the document.

Re understanding Jalview's datamodel.. I know an ER diagram would help, 
but it will only get you so far, since you also need to think about how 
the data that you are trying to import into Jalview is structured 
(remember, the XML format may not necessarily correspond to the way 
that the data might be most usefully be handled in Jalview).

> _GenBank
> _
> I have parsed the file to get sequences and features. In this version
> of the patch (not the one attached at JIRA) I think I can translate
> sequences from file to Jalview sequences (please, check) but I don't
> know what happens with file headers and features. How can I inject
> this into Jalview datamodel? Which is the correspondence between them?

We are going to talk through this on our next google hangout.

> _Integration framework
>
> _
> I've been thinking about how to integrate Jalview with other tools and
> systems. At e-learning domain there are several interesting
> initiatives whose approximations are worth to be examined.
> Take a look at this two: JISC E-learning Framework
> (http://www.jisc.ac.uk/whatwedo/programmes/elearningframework.aspx)
> and OKI (http://en.wikipedia.org/wiki/Open_Knowledge_Initiative). Both
> of them are based on the concept of service and service interfaces but
> don't force to use any particular implementation. This offer better
> interoperability between platforms and this is a good change to make
> tools adoption to grow. I'm going to work on this idea with a
> colleague, trying to put this ideas in a paper to see if it's accepted
> at RCIS (http://www.rcis-conf.com/rcis2014/). If you are interested at
> participating with us, let me know and I'll him.
> If you think this is a good idea, probably we can discuss this in
> detail and even open the discussion to more people.

You are quite right in recognising that Jalview would benefit from 
being part of an information integration framework. In fact, Jalview 
already includes a couple. VAMSAS is a prototype data and application 
integration framework for bioinformatics data that I developed in 
collaboration with some other groups. DAS is a much more widespread 
data integration framework based on XML/REST services that has been 
around since 2001 (http://www.biodas.org/wiki/Main_Page). It was 
developed for sequences and sequence features on genomes, but has been 
adapted to work with other types of data.

As you might imagine, I'm interested in integration frameworks, and 
would be interested discussing ideas with you colleague, though I 
should say now that I already have enough deadlines for this year!

> _i18n management
> _
> I was wondering if it is possible to create to separate components at
> JIRA, one for bugs/FR/etc. related with i18n and other one for
> translations. In this way, if the issue is, for example, a mistake in
> property bundle or a new language contribution, we'll marked them as
> Translation related. On the contrary, if the issue is something that
> doesn't work when you switch the language from English to French,
> we'll specify it as Internationalization related. What do you think
> about that?
Done. Translations component is here.
http://issues.jalview.org/browse/JAL/component/10780

Jim.
_______________________________________________
Jalview-dev mailing list
[email protected]
http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-dev

Re: [Jalview-dev] dbSNP, GenBank, integration framework and i18n management

Reply via email to