Shooting them would be too kind. A crueler punishment would be to force them to actually use it. Of course any mind diabolical enough to emit XML like that might actually take pleasure in such a thing. :)
-----Original Message----- From: Matthew Pocock [mailto:[EMAIL PROTECTED] Sent: Thursday, July 10, 2003 1:40 AM To: David Huen Cc: DeAngelo Lampkin; [EMAIL PROTECTED] Subject: Re: [Biojava-l] Massaging multi-query BLAST XML output... Is there some individual responsible for these bioinf apps emmitting badly-formed XML that we can shoot very publicly at a conference as a warning to all others? (grins) Matthew David Huen wrote: > On Wed, 9 Jul 2003, DeAngelo Lampkin wrote: > > >>Hi guys, >> >>First of all, thanks to Keith and Matthew on the assist with the last question. And >>to the rest, shame on you all for not helping sooner! :) > > > Fast paid-for service and development can hopefully be obtained from > Biojava Consulting. But as this is a voluntary sideline on my part, I > suppose I do not feel particularly obliged to jump when told to do so... > > Alternatively, support is >sometimes< available on irc.freenode.net:6667 > channel #biojava. That is, of course, when we are not just messing > around. A fair proprtion of the traffic is development-related. > > >>So now for my newest question concerning parsing Blast XML files; >>specifically the mangled XML file that come out as a result of a >>multiple query FASTA file search . I read the JavaDoc on >>BlastXMLParser >>and it made reference to a shell script (blast_aggregate) that massages >>the XML output into something that is, you know, *legal* XML. Is >>this an actual script floating around somewhere or was it something >>put in for illustrative purposes only? I could do it myself, >>I suppose, but while I like wheels as much as the next guy, >>I try to avoid reinventing them when possible. >> > > This was the script used when this parser was much younger:- > #!/bin/sh > # Converts a Blast XML output to something vaguely well-formed > # for parsing. > # Use: blast_aggregate <XML output> <editted file> > > # strips all <?xml> and <!DOCTYPE> tags > # encapsulates the multiple <BlastOutput> elements into <blast_aggregator> > > > sed '/<?xml/d' $1 | sed '/<!DOCTYPE/d' | sed '1i\ > <blast_aggregate> > $a\ > </blast_aggregate>' > $2 > > ================== > > Which I wrote and used with a very early incarnation of this parser. The > trouble is I don't even understand it anymore. I think it just > sequentially strips each element in a hacky way and then prepend and > append the root element. Since then, the default > SAX parser has changed and DTDs are now used in the parsing so i don't > know whether the output will still work as expected. I know the single > copy case still works as I did a demo for that one very recently. > (note that the required element is <blast_aggregate>, not > <blast_aggregator>, javadocs have just been updated.) > > > There was a DTD problem reported by jinchen:- > http://biojava.org/pipermail/biojava-l/2003-June/003947.html > > which I have been unable to reproduce. > > There is also another masseur available:- > http://biojava.org/pipermail/biojava-l/2003-June/003933.html > > but I don't think this one wraps it the same way. I haven't checked to > see if the output works just the same though. It should not be difficut > to modify it to do the same thing as the other script if that is a good > thing. > > I haven't used this code in anger since Nov last year. I should point out > that not all parts of the blast output XML are used: not all of it can be > force-fitted to the DTD standard we use for the interface code. All that > was discussed on the ML last month. > > I would strongly recommend using the BlastXMLParserFacade class instead of > the BlastXMLParser class unless setting up StAX parsers appeals to you. > There is a demo of the use of the latter in the biojava-live repository. > > D.H. > > > _______________________________________________ > Biojava-l mailing list - [EMAIL PROTECTED] > http://biojava.org/mailman/listinfo/biojava-l > -- BioJava Consulting LTD - Support and training for BioJava http://www.biojava.co.uk _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l