string(bytestring(...)) seems to do it.  would appreciate any more 
efficient solutions (and confirmation the analysis is correct - is this 
worth filing as an issue?)

On Tuesday, 21 July 2015 19:33:05 UTC-3, andrew cooke wrote:
>
>
> well, this was fun...  the following code rapidly triggers the OOM killer 
> on my machine (julia 0.4 trunk):
>
> s = repeat("a", 1000000)
> l = Any[]
> r = r"^\w"
>
> for i in 1:length(s)
>     m = match(r, s[i:end])
>     push!(l, m.match)
> end
>
> note that: (1) the regexp is only matching one character, so the array l 
> is at most a million characters long.
>
> what i think is happening (but this is only a guess) is that s[i:end] is 
> being passed though to the c level regexp library as a new string.  the 
> result (m.match) is then a substring into that.  because the substring is 
> kept around, the backing string cannot be collected.  and so there's an n^2 
> memory use.
>
> ideally, i don't think a new copy of the string should be passed to the 
> regexp engine.  maybe i am wrong?
>
> anyway, for now, if the above is right, i need some way to copy m.match.  
> as far as i can tell string() doesn't help.  so what works?  or am i wrong?
>
> thanks,
> andrew
>

Reply via email to