At 16:38 11/08/2007, richard apodaca wrote:

This is a very important topic and important to us as we need to 
transfer algorithms to a non-Java environment (no, we are not giving 
up Java). So thoughts

>--- Noel O'Boyle <[EMAIL PROTECTED]> wrote:
>
> > Dear all,
> >
> > Because I'm such a lazy programmer I was wondering
> > whether we could
> > work together on implementing canonical forms of
> > algorithms that can
> > be automatically translated to several target
> > languages so that a
> > single implementation can be shared between several
> > programs.
>
>There's a lot of duplication of effort. For example,
>consider the various implementions of molfile, and CML
>readers/writers. Each has its own quirks and
>maintenence issues. There's also duplication in
>descriptors.
>
>It all adds up to a lot of duplicated, low-payoff
>activities.

Yes. I think we need to following:
* declarative languages. We do part of this in JUMBO as the abstract 
classes are automatically generated from the schema. So if we want a 
JUMBO++ it is relatively easy to generate about half the code 
algorithmically. But the rest is harder.
* exposed data structures. In our polymer builder there are 4 
successive exposed data structures as the molecule gets built. This 
was an extremely useful technique - often we can look at the exposed 
XML without having to read the code.
* unit tests. Absolutely critical. It means the tools have to be 
fairly modular and that means we can work on subtasks in a relaxed atmosphere.
* data and table driven code. Structures such as nested Ifs are often 
better and easier expressed in tables. I picked this up from the 
MKM2007 meeting on maths. I think there is a lot of potential here. 
For example all our valency rules should be tabular



> > For example, Christoph Steinbeck (and colleagues?)
> > has implemented a
> > 2D layout algorithm as part of CDK. Say if we took
> > this Java
> > implementation, changed it to accept a connection
> > table (rather than a
> > CDKMolecule or whatever), and ran it through
> > java2python
> > (http://code.google.com/p/java2python/). Then we
> > would get a Python
> > implementation of the same algorithm at no cost. Bug
> > fixes would be
> > applied upstream and we'd all benefit. All we'd need
> > is a java2C++ and
> > it could be used by OpenBabel.
>
>IMO, Structure Diagram Generation (SDG) is the crown
>jewel of CDK. Whenever you're interfacing code that
>works with SMILES, IUPAC nomenclature (or InChI?) and
>end users, you're gonna need SDG. It's a use case that
>will only increase in importance as automated  agents
>increasingly comb the Internet for chemical structure
>data (ChemSpider and Chemical Blogspace, for example).

Yes. We're also doing this with crystaleye and also some new 
non-public to-be-Open data which should come out soon.

>The only other way you'll get SDG is by paying big
>bucks. SDG is one of the most difficult (and
>practical) cheminformatics problems out there, and
>vendors know it.

I am having to do something in this area (I don't want to but...) and 
tend to agree.

I think social computing has a great potential here. Templates. And numbering.


> >From my perspective, the main problem with CDK's SDG
>code is its monolithic nature. The last time I
>checked, most of its methods contained dozens of lines
>of code and many contained six or more levels of
>nesting. In other words, I could see no way to extend
>it (or even fix bugs!) without a major refactoring.

I think that refactoring the key parts of CDK is probably the most 
important (but tedious task) facing the BO community. We use SDG, 
some of the SSS and some of the fingerprint stuff. We'd dearly like 
to see it under maven. I don't know whether this can be done 
communally - can we have a virtual codefest and throw out those bits 
that aren't core?

>This state of affairs severely limits the ability of
>the community to get involved in any way with
>something that, in terms of its functionality, is an
>excellent product.
>
>Regarding actually translating Java code into other
>languages - it's an interesting idea. What other
>alternatives exist? For example, Ruby has excellent
>support for directly using Java libraries without
>modification (both through JRuby and through Ruby Java
>Bridge) - what about Python and C/C++?

There are Java to C# converters. Don't know how well they work.

Whatever we do, clear simple code will be easier to translate and 
migrate. So meaningful variable names, no overloaded operators, 
limited chaining of arguments, etc. (we need the debug points!);


> > Ditto for a SMILES parser. Ditto for a 2D-to-3D
> > algorithm and so on.
> >
> > Does this sound crazy or really sensible?
> >

It sounds really sensible and a lot of work...

P.


Peter Murray-Rust
Unilever Centre for Molecular Sciences Informatics
University of Cambridge,
Lensfield Road,  Cambridge CB2 1EW, UK
+44-1223-763069 


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Blueobelisk-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

Reply via email to