Re: File Chopping Algorithm

James G. Sack (jim) Tue, 12 Dec 2006 18:38:19 -0800

Andrew Lentvorski wrote:
> Todd Walton wrote:
> 
>> So, the script runs through the text file line by line, until it finds
>> the opening description tag and then, starting with the next line,
>> writes it all out to a new file until it comes to the end-description
>> tag.  Same for the other two.  Will this work?  If the blocks are out
>> of order in the datafile will this still work?
> 
> Possibly, but you're making an awful lot of work for yourself and it
> will be brittle if you need add or subtract sections with time.
> 
>> Should I change something?
> 
> Yes, this is the kind of thing that XML was actually made for.
> 
> Since you are already using "HTML-style" tags, I heartily recommend that
> you add just enough extra structure so that you can let any of the
> myriad XML DOM bindings just suck the whole file in and then work on it.
> 
> The magic keywords in Python are probably pulldom and/or elementtree.
> I'm sure Perl has something similar.
>


Possible additional useful stuff:
 fix broken html/xml: elementtidy (*very nice*, based on html tidy prog)
   http://effbot.org/zone/element-tidylib.htm
 from xml.sax.saxutils import XMLFilterBase, XMLGenerator
   http://www-128.ibm.com/developerworks/xml/library/x-tipsaxflex.html

Uche has some good examples in his writings (esp: those on xml.com)
  http://uche.ogbuji.net/tech/akara/pyxml/
  (ps: the 4Suite package from his company transparently
       extends standard python xml stuff)

Possibly also related:
  http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/84515
  http://effbot.org/zone/element-index.htm


Of course it all may be overkill if you're just going to do this once or
twice.

Regards,
..jim


-- 
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list

Re: File Chopping Algorithm

Reply via email to