On Thursday, December 24, 2015 at 6:57:05 PM UTC-6, Ismael Venegas Castelló
wrote:
>
> Is something like this what you are looking for?
>
> julia> matchall(r"(\d+?.+?\.ts)", s)
> 3-element Array{SubString{UTF8String},1}:
> "00001.ts"
> "00002.ts"
> "00xyz.ts"
>
>
> Thank you. I was missing the fact that matchall existed. I had a
suspicion that there was a function like that but I was unable to navigate
the documentation to it.
>
> El jueves, 24 de diciembre de 2015, 12:30:52 (UTC-6), Douglas Bates
> escribió:
>>
>> Short version:
>>
>> I have a string that contains several instances of names of the form
>> 00001.ts, 00002.ts, ...., 00xyz.ts and I want to find the last match.
>> That is, I want to find "00xyz.ts" or, alternatively, find all such names
>> in sequence..
>>
>> Longer version:
>>
>> These are file names of a series of transport stream files containing
>> audio, video, close captions, etc. The device generating them, a Tablo
>> over-the-air video recorder, http://tabotv.com, breaks a recording into
>> many small segments with these names. A program can query the device at a
>> particular URL and get a listing of the directory containing these
>> segments, returned as XHTML. I want all the file names in sequence so that
>> I can download each of these files in sequence and create a single file by
>> appending them.
>>
>> The names each occur multiple times in the string but each one only once
>> in the form that will match r">(\d+\.ts)<".
>>
>> I can think of two ways of getting these names. One is to parse the
>> string as XHTML and walk through the object to find these names. The other
>> is to match the regular expression in the string, extract the "captures"
>> field, match again starting at the current offset + 1, and continue until
>> there are no further matches.
>>
>> The XML approach is more elegant but not especially easy. The regular
>> expression matching is reasonably straightforward to implement but more
>> fragile.
>>
>> Am I missing an elegant, robust approach here?
>>
>