On Fri, 5 Apr 2002, Tom Hudson wrote: > A research group here at UNCW is starting a couple of bioinformatics > projects in Java. I said, "look here, there's this open-source group on > the web that's created a huge amount of code already, let's use it!" > > The responses I've gotten have been on the order of "Eww, 600+ classes. > I can write my own parser faster than I can figure out what they're > doing." and "We don't need data models anywhere near that complex." So > why should we use BioJava? The "overview" on the web page hasn't > convinced anybody here.
It all depends what your group is intending to do really. If all it wants to do is read some EMBL files and do a few manipulations with them, then perhaps, yes, you could write a parser faster than going thru' the trouble of understanding BioJava. But even in this simple case, I suspect you'll end up writing that very same parser again and again because you want to add some functionality that was not envisaged originally. And debugging it. And fixing it when a format change occurs. And dealing with corner cases where some source doesn't quite follow the "standard" correctly. If you wish to do more than that, the balance will shift rapidly. you may want to begin with just a sequence and some analysis on it and mark out some features. Fine, it's not so difficult to hack out something like that. Then you want to do it over a set of sequences comprising a contig: your codes needs support for assemblies then. You write that. You want to visualise it. Write some renderers. You want persistent data objects that can be stored to a SQL DB. Write some more code. You want data from some other site in some other format and different coordinate system. More code. The unspeakable ******** at the other end points you to his DAS server and says get the stuff you want from there. Write a DAS client. Your transcripts have exons (voila! nested feature). YOur gene has multiple transcripts (hey! another nested feature). Need translations. More code. Need dynamic programming for HMM implementation. More code. Need to do Blast/Fasta/HMMer output parsing? More code. By this stage, you might have well rewritten BioJava. Which gives you all this out of the box. Today. You need to exchange data with other labs? That don't use Java? Fine, OBF projects have interoperability to a fair degree between Perl/Java/Ruby/Python/etc. So anyway, I would say that it is difficult to predict what you need at the start and as you develop more and more custom code, it's gets harder and harder to to move to another system because of the legacy that is created which in turn commits you to even more coding rather than the research the coding is intended to support. > > (Caveats: right now we're a bunch of computer people and a bunch of > biologists, with nobody really cross-trained; I understand that some day > the biologists may start asking questions that require a nested-feature > view of the world, but haven't convinced the other computer people to > plan for that day yet, and the biologists can't think of any right now.) > Don't your biologists have any imagination? They should really be giving you guys a harder time. :-) Might I suggest that it's not very wise asking biologists to specify software? (Speaking as a biologist who codes too). Look at at what they do and try to get to the bottom of the question they are really asking. It's not software they want but solutions and what you'll really want to do is offer them solutions that happen to be software. Otherwise, you'll just end up rewriting software repeatedly as those 5%%$%$^%$%$ biologists get you with feature creep (Sheesh, don't you guys read Dilbert? At this rate, you might as well ask politicians to do sincerity... ;-) ). > On a closely related topic, I want to attend a BioJava Boot Camp. This > year's appears to have been announced a month and a half ahead. > Unfortunately, for some US federal grants, I have to schedule my travel > a year ahead and more. When's the next Boot Camp? > I don't think it's planned way in advance. So far, it's been annual (this year's is only the second time it's been held). I suspect it could be more often and even elsewhere in the world if a) there's the demand, b) there's a local organiser who can fix up accommodation for attendees and instructors, do admin and organisation, liaise with local buildings management people/ networks & IT people, collect money, do accounting, etc. c) access to a teaching facility with enough computers and networking to do instruction in, d) enough instructors (all the current ones do something else for a living...). So far, the only place where all these conditions have been fulfilled is once a year at the EBI. (N.B. I'm not a authoritative source on this but I think it is probably correct. Personally I can't see it being held more than twice a year at most because of (d)) OTOH, there's work behind the scenes to improve online instruction resources so the need for a formal course may be reduced soon. Regards, David Huen _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l