Todd Walton wrote:
>
> What I can assume about these files is that each will have three
> pre-defined blocks of text, enclosed by HTML style tags. The tags are
> on their own line. There may or may not be text outside of these
> three blocks. There may or may not be blank lines between the blocks.
> The blocks may or may not be in a given order. Etc.
If you know that the tags are of the <::LABEL::> </::LABEL::> variety,
where ::LABEL:: is a unique string that matches from open to close then
this is easy.
Assuming, of course, that there exist no other tags. If you know the
::LABEL::s in advance, then that is so much the easier.
If you know that the three-section delimiters are the only sgml/html
style tags, you are okay. If there could be others, then there could
very easily be problems.
In html, tags can nest. If you might see something like:
<Sec1>
This is section one. In it, we have text like
<Sec1>
to indicate to the end user what the tag looks like, so when the section
is over, we would enter the closing tag on a line by itself:
</Sec1>
Easy, isn't it?
</Sec1>
Okay, contrived, but it gets the point across. If you can make a
guarantee that you will never see that, then it's rather simple:
for each line do:
is LABEL known?
yes: does line match the regular expression ^</LABEL>$ ?
yes: forget LABEL
increment SECTION
no: write line to SECTION
no:
does line match the regular expression ^<(.*)>$ ?
no: end loop
yes: remember the bit inside the <> for later. call it LABEL
end.
I did not desk check this algorithm, flavour to suit your intended
language. It will break if the assumptions I made above are false.
-john
--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list