On Wed, 9 Jul 2003, DeAngelo Lampkin wrote: > Hi guys, > > First of all, thanks to Keith and Matthew on the assist with the last question. And > to the rest, shame on you all for not helping sooner! :)
Fast paid-for service and development can hopefully be obtained from Biojava Consulting. But as this is a voluntary sideline on my part, I suppose I do not feel particularly obliged to jump when told to do so... Alternatively, support is >sometimes< available on irc.freenode.net:6667 channel #biojava. That is, of course, when we are not just messing around. A fair proprtion of the traffic is development-related. > > So now for my newest question concerning parsing Blast XML files; > specifically the mangled XML file that come out as a result of a > multiple query FASTA file search . I read the JavaDoc on > BlastXMLParser > and it made reference to a shell script (blast_aggregate) that massages > the XML output into something that is, you know, *legal* XML. Is > this an actual script floating around somewhere or was it something > put in for illustrative purposes only? I could do it myself, > I suppose, but while I like wheels as much as the next guy, > I try to avoid reinventing them when possible. > This was the script used when this parser was much younger:- #!/bin/sh # Converts a Blast XML output to something vaguely well-formed # for parsing. # Use: blast_aggregate <XML output> <editted file> # strips all <?xml> and <!DOCTYPE> tags # encapsulates the multiple <BlastOutput> elements into <blast_aggregator> sed '/<?xml/d' $1 | sed '/<!DOCTYPE/d' | sed '1i\ <blast_aggregate> $a\ </blast_aggregate>' > $2 ================== Which I wrote and used with a very early incarnation of this parser. The trouble is I don't even understand it anymore. I think it just sequentially strips each element in a hacky way and then prepend and append the root element. Since then, the default SAX parser has changed and DTDs are now used in the parsing so i don't know whether the output will still work as expected. I know the single copy case still works as I did a demo for that one very recently. (note that the required element is <blast_aggregate>, not <blast_aggregator>, javadocs have just been updated.) There was a DTD problem reported by jinchen:- http://biojava.org/pipermail/biojava-l/2003-June/003947.html which I have been unable to reproduce. There is also another masseur available:- http://biojava.org/pipermail/biojava-l/2003-June/003933.html but I don't think this one wraps it the same way. I haven't checked to see if the output works just the same though. It should not be difficut to modify it to do the same thing as the other script if that is a good thing. I haven't used this code in anger since Nov last year. I should point out that not all parts of the blast output XML are used: not all of it can be force-fitted to the DTD standard we use for the interface code. All that was discussed on the ML last month. I would strongly recommend using the BlastXMLParserFacade class instead of the BlastXMLParser class unless setting up StAX parsers appeals to you. There is a demo of the use of the latter in the biojava-live repository. D.H. _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l