Hi Marcel - One possible solution would be to customise the handler and the parser so they can talk to each other and the handler can make call backs to the parser.
However, there is a fundamental problem with the BlastLikeSAXParser. Because it is a SAX parser it is not at all suited to bouncing around the file it is parsing because SAX parsing is event based. Therefore I think you need a different paradigm. If you have lots of memory you could go with something that is more like a DOM parser and reads the whole file into memory (or uses java nio to pretend to) and use something like XQuery to find what you want. If you are using BLAST XML output you could also build an object tree with JAXB and navigate that. You can also combine SAX and DOM to read memory sized chunks in one go but this can be clunky. Note, I am assuming you will use BLAST XML. If you are not I would strongly encourage it for the task you describe. It will also make you parsers much more robust to BLAST version changes. Sorry the standard BioJava model can't really help here but please consider posting you're solution or adding it as a recipe in the cookbook as others are sure to have similar problems soon. - Mark [email protected] wrote on 03/12/2009 11:00:38 AM: > Hi Mark! > > The blast etc. is parallelized. The contigs are split into groups of 1000 > and I also modified my program in the way that it works now with all those > separate files. But nevertheless I also have a program that works on the > concatenated blast output. The parser with my customized handler is always > looking for the results of a certain contig and then compares these > results to something else and also does some other stuff in-between to > calculate some statistics and then creates a new parser again to get the > results for the next contig. So a System.exit() is not an option, since it > would stop my whole program (in which I am using the parser). I also don't > wanna start working with threads here. I was just hoping that there would > be a way to tell the handler that, when a certain condition is met, it > should give the parser a signal to stop parsing (and maybe even to reset > itself to the first line). But I guess there's no way to do it in the > customized handler... > > Thanks, > Marcel > > > [email protected] wrote: > > > > Hi - > > > > There are many ways to stop the parsing but it really depends on how you > > have set the program up. Notably there is no way for the Blast parsing > > system of BioJava to shut itself down but control probably shouldn't > > happen at that level. > > > > A crude but effective procedure is to write out the results when you > > find the hit of interest and then simply call System.exit() > > > > Another approach would be to spawn Tasks to parse each record and then > > have them signal to the main thread when they are complete to shut them > > down. If you are using Java 1.5 or earlier then you would need to do > > this with Threads. If you have a later version you can use the > > concurrent packages which are much nicer to deal with. > > > > One thing I don't understand is why you don't blast each contig > > separately, in that case the results would only contain your hit of > > interest. That means 90K separate blasts but there are versions of > > blast that run on clusters and the database (3 million genes) is not > > huge so it should be an embarrassingly parallel problem? > > > > - Mark > > > > [email protected] wrote on 03/10/2009 03:00:36 AM: > > > >> Hi Mark! > >> > >> Mark Schreiber wrote: > >> > You could just customize BlastEcho to pass on the events of interest, > >> > ignore those that are not interesting. > >> That's what I am doing right now. But I don't know, how to tell my > >> customized BlastEcho to stop, when a certain condition is met during a > >> paricular event call. What's the command for stopping there? > >> > >> > It could also exit if a certain > >> > event occurs. > >> How? > >> > >> > Remember it cost almost nothing to read the file so you > >> > save time by only sending interesting events for parsing. > >> Hmm, I am not sure, if it's really almost nothing, when I've about 90,000 > >> contigs that were blasted against a database with about maybe 3,000,000 > >> genes. The blast output that I am parsing is about 13Gig big and every > >> cycle I am looking for the results of one particular contig of these > >> 90,000 contigs. So I definitely experienced that the time sums up a lot, > >> when it's running in each of these 90,000 cycles over the whole file, > >> although the contig I am looking for was already at the beginning > > ofthe file. > >> > >> > >> Cheers, > >> Marcel > _______________________________________________ > Biojava-l mailing list - [email protected] > http://lists.open-bio.org/mailman/listinfo/biojava-l _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
