On Thu, 26 Jun 2003, Russell Smithies wrote: I normally use a bash script with sed for this but i think it's still worth having.
Mind if I stick it into the javadocs somewhere within the blastXML package? Regards, David > Some people may call this cheating but I wrote a simple utility > pre-processor for blast XML to convert it into something a basic SAX parser > can read :-) > ----------------------------------------------------------- > import java.io.*; > > public class XMLPreProcessor{ > /** > * A simple utility method to create a new XML file containing data > * converted from the default blast -m7 XML format into something that > * can be easily read by a standard SAX parser. > * > * @param inFileName name of file in default blast -m7 format > * @param outfileName name of output file converted to SAX-parser > compliant XML > * @author Russell Smithies > */ > public void process(String inFileName, String outfileName){ > try{ > BufferedReader in = new BufferedReader(new FileReader(new > File(inFileName))); > BufferedWriter out = new BufferedWriter(new FileWriter(outfileName)); > StringBuffer sb = null; > //print XML version header > out.write(in.readLine()); > out.newLine(); > while(in.ready()){ > String line = in.readLine(); > //preserve single line comments containing DTD stuff > if(line.indexOf("<!") >= 0){ > out.write(line); > out.newLine(); > //XML header type node > } else if(line.indexOf(">") == line.length() - 1){ > out.write(line); > out.newLine(); > //prune crap out of other lines > } else{ > sb = new StringBuffer(line); > sb.replace(sb.indexOf(">"), sb.indexOf(">") + 1, "=\""); > sb.delete(sb.lastIndexOf("<"), sb.length() - 1); > sb.insert(sb.length() - 1, "\"/"); > sb.replace(sb.indexOf("_"), sb.indexOf("_") + 1, " "); > out.write(sb.toString()); > out.newLine(); > } > } > out.flush(); > out.close(); > } catch(IOException ex){ > ex.printStackTrace(); > } > } > } > -------------------------------------------------------------------------- > > it produces nice looking XML but it's probably not worth adding to biojava. > > > Russell > > > _______________________________________________ > Biojava-l mailing list - [EMAIL PROTECTED] > http://biojava.org/mailman/listinfo/biojava-l > David Huen, Ph.D. Email: [EMAIL PROTECTED] Dept. of Genetics Fax : +44 1223 333992 University of Cambridge Phone: +44 1223 766748/333982 Cambridge, CB2 3EH U.K. _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l