well, this was fun... the following code rapidly triggers the OOM killer
on my machine (julia 0.4 trunk):
s = repeat("a", 1000000)
l = Any[]
r = r"^\w"
for i in 1:length(s)
m = match(r, s[i:end])
push!(l, m.match)
end
note that: (1) the regexp is only matching one character, so the array l is
at most a million characters long.
what i think is happening (but this is only a guess) is that s[i:end] is
being passed though to the c level regexp library as a new string. the
result (m.match) is then a substring into that. because the substring is
kept around, the backing string cannot be collected. and so there's an n^2
memory use.
ideally, i don't think a new copy of the string should be passed to the
regexp engine. maybe i am wrong?
anyway, for now, if the above is right, i need some way to copy m.match.
as far as i can tell string() doesn't help. so what works? or am i wrong?
thanks,
andrew