> (I'm interested in XML processing as well - > also large files, though not for bio stuff)
> can you show a test case (actual source code, > XML input data, and your performance measurements)? Probably - the data file I used is a bit large (eight gigs), so probably not ideal to ship around as a test case. > what is meant by "the parsing is lazy" exactly? I don't know, did I use that term? > You want a BlastResult with a lazy list of results > (containing BlastRecords with a lazy list of hits, etc)? No - that is the case now, but I generally just discard the top BlastResult "node", and extract the results -- as a lazy list. > but you still want to accept valid files only? I can live with getting an error message after partial processing. The XML is machine generated, so any error is an upstream software error - to be fixed, and then the whole thing must be run again. And tagsoup is lenient, I don't think it cares much about validity or even well-formedness. One thing that might work, would be to replace the hierarchical structure: BlastResult {... results :: [BlastRecord] } BlastRecord {query, etc... hits :: [BlastHit] } BlastHit {target, etc... matches :: [BlastMatch] } BlatMatch { position etc } with a flat one, e.g.: BlatFlat { query, target, position etc... } This means you will repeat lots of information in subsequent records, but would probably avoid the spikes in memory use, and certainly avoid the lists of sub-elements. -k -- If I haven't seen further, it is by standing in the footprints of giants