Short version: I have a string that contains several instances of names of the form 00001.ts, 00002.ts, ...., 00xyz.ts and I want to find the last match. That is, I want to find "00xyz.ts" or, alternatively, find all such names in sequence..
Longer version: These are file names of a series of transport stream files containing audio, video, close captions, etc. The device generating them, a Tablo over-the-air video recorder, http://tabotv.com, breaks a recording into many small segments with these names. A program can query the device at a particular URL and get a listing of the directory containing these segments, returned as XHTML. I want all the file names in sequence so that I can download each of these files in sequence and create a single file by appending them. The names each occur multiple times in the string but each one only once in the form that will match r">(\d+\.ts)<". I can think of two ways of getting these names. One is to parse the string as XHTML and walk through the object to find these names. The other is to match the regular expression in the string, extract the "captures" field, match again starting at the current offset + 1, and continue until there are no further matches. The XML approach is more elegant but not especially easy. The regular expression matching is reasonably straightforward to implement but more fragile. Am I missing an elegant, robust approach here?
