Short version:

I have a string that contains several instances of names of the form 
00001.ts, 00002.ts, ...., 00xyz.ts and I want to find the last match.   
That is, I want to find "00xyz.ts" or, alternatively, find all such names 
in sequence..

Longer version:

These are file names of a series of transport stream files containing 
audio, video, close captions, etc.  The device generating them, a Tablo 
over-the-air video recorder, http://tabotv.com, breaks a recording into 
many small segments with these names.  A program can query the device at a 
particular URL and get a listing of the directory containing these 
segments, returned as XHTML.  I want all the file names in sequence so that 
I can download each of these files in sequence and create a single file by 
appending them.

The names each occur multiple times in the string but each one only once in 
the form that will match r">(\d+\.ts)<". 

I can think of two ways of getting these names.  One is to parse the string 
as XHTML and walk through the object to find these names.  The other is to 
match the regular expression in the string, extract the "captures" field, 
match again starting at the current offset + 1, and continue until there 
are no further matches.

The XML approach is more elegant but not especially easy.  The regular 
expression matching is reasonably straightforward to implement but more 
fragile.

Am I missing an elegant, robust approach here?

Reply via email to