OK, thanks heaps 4 your help, Mark!
[email protected] wrote: > > Hi Marcel - > > One possible solution would be to customise the handler and the parser > so they can talk to each other and the handler can make call backs to > the parser. > > However, there is a fundamental problem with the BlastLikeSAXParser. > Because it is a SAX parser it is not at all suited to bouncing around > the file it is parsing because SAX parsing is event based. Therefore I > think you need a different paradigm. If you have lots of memory you > could go with something that is more like a DOM parser and reads the > whole file into memory (or uses java nio to pretend to) and use > something like XQuery to find what you want. If you are using BLAST XML > output you could also build an object tree with JAXB and navigate that. > > You can also combine SAX and DOM to read memory sized chunks in one go > but this can be clunky. > > Note, I am assuming you will use BLAST XML. If you are not I would > strongly encourage it for the task you describe. It will also make you > parsers much more robust to BLAST version changes. > > Sorry the standard BioJava model can't really help here but please > consider posting you're solution or adding it as a recipe in the > cookbook as others are sure to have similar problems soon. > > - Mark > > [email protected] wrote on 03/12/2009 11:00:38 AM: > >> Hi Mark! >> >> The blast etc. is parallelized. The contigs are split into groups of 1000 >> and I also modified my program in the way that it works now with all those >> separate files. But nevertheless I also have a program that works on the >> concatenated blast output. The parser with my customized handler is always >> looking for the results of a certain contig and then compares these >> results to something else and also does some other stuff in-between to >> calculate some statistics and then creates a new parser again to get the >> results for the next contig. So a System.exit() is not an option, since it >> would stop my whole program (in which I am using the parser). I also don't >> wanna start working with threads here. I was just hoping that there would >> be a way to tell the handler that, when a certain condition is met, it >> should give the parser a signal to stop parsing (and maybe even to reset >> itself to the first line). But I guess there's no way to do it in the >> customized handler... >> >> Thanks, >> Marcel >> >> >> [email protected] wrote: >> > >> > Hi - >> > >> > There are many ways to stop the parsing but it really depends on how you >> > have set the program up. Notably there is no way for the Blast parsing >> > system of BioJava to shut itself down but control probably shouldn't >> > happen at that level. >> > >> > A crude but effective procedure is to write out the results when you >> > find the hit of interest and then simply call System.exit() >> > >> > Another approach would be to spawn Tasks to parse each record and then >> > have them signal to the main thread when they are complete to shut them >> > down. If you are using Java 1.5 or earlier then you would need to do >> > this with Threads. If you have a later version you can use the >> > concurrent packages which are much nicer to deal with. >> > >> > One thing I don't understand is why you don't blast each contig >> > separately, in that case the results would only contain your hit of >> > interest. That means 90K separate blasts but there are versions of >> > blast that run on clusters and the database (3 million genes) is not >> > huge so it should be an embarrassingly parallel problem? >> > >> > - Mark >> > >> > [email protected] wrote on 03/10/2009 03:00:36 AM: >> > >> >> Hi Mark! >> >> >> >> Mark Schreiber wrote: >> >> > You could just customize BlastEcho to pass on the events of interest, >> >> > ignore those that are not interesting. >> >> That's what I am doing right now. But I don't know, how to tell my >> >> customized BlastEcho to stop, when a certain condition is met during a >> >> paricular event call. What's the command for stopping there? >> >> >> >> > It could also exit if a certain >> >> > event occurs. >> >> How? >> >> >> >> > Remember it cost almost nothing to read the file so you >> >> > save time by only sending interesting events for parsing. >> >> Hmm, I am not sure, if it's really almost nothing, when I've about > 90,000 >> >> contigs that were blasted against a database with about maybe 3,000,000 >> >> genes. The blast output that I am parsing is about 13Gig big and every >> >> cycle I am looking for the results of one particular contig of these >> >> 90,000 contigs. So I definitely experienced that the time sums up a > lot, >> >> when it's running in each of these 90,000 cycles over the whole file, >> >> although the contig I am looking for was already at the beginning >> > ofthe file. >> >> >> >> >> >> Cheers, >> >> Marcel >> _______________________________________________ >> Biojava-l mailing list - [email protected] >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > _________________________ > > CONFIDENTIALITY NOTICE > > The information contained in this e-mail message is intended only for > the exclusive use of the individual or entity named above and may > contain information that is privileged, confidential or exempt from > disclosure under applicable law. If the reader of this message is not > the intended recipient, or the employee or agent responsible for > delivery of the message to the intended recipient, you are hereby > notified that any dissemination, distribution or copying of this > communication is strictly prohibited. If you have received this > communication in error, please notify the sender immediately by e-mail > and delete the material from any computer. Thank you. _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
