i think that returns a substring (ir a view onto the backing string). but i am not sure. i did read a discussion somewhere saying that because of this you should use bytestring(...) before passing a string to c. which is all the evidence i have for my guess.
incidentally, match(...) has a method that takes the offset to start at as an argument. so i can avoid s[i:end] and just pass i into match (i just found this). however, somewhat surprisingly, it also has the same problem. andrew On Tuesday, 21 July 2015 20:15:58 UTC-3, Yichao Yu wrote: > > On Tue, Jul 21, 2015 at 7:08 PM, Jameson Nash <vtj...@gmail.com > <javascript:>> wrote: > > does `copy` work? although `bytestring` also seems like a good method > for > > this also. it seems wrong to me also that `match` is making a copy of > the > > original string (if that is indeed what it is doing) > > Isn't it `s[i:end]` that is doing the copy? > > > > > On Tue, Jul 21, 2015 at 6:57 PM andrew cooke <and...@acooke.org > <javascript:>> wrote: > >> > >> > >> string(bytestring(...)) seems to do it. would appreciate any more > >> efficient solutions (and confirmation the analysis is correct - is this > >> worth filing as an issue?) > >> > >> > >> On Tuesday, 21 July 2015 19:33:05 UTC-3, andrew cooke wrote: > >>> > >>> > >>> well, this was fun... the following code rapidly triggers the OOM > killer > >>> on my machine (julia 0.4 trunk): > >>> > >>> s = repeat("a", 1000000) > >>> l = Any[] > >>> r = r"^\w" > >>> > >>> for i in 1:length(s) > >>> m = match(r, s[i:end]) > >>> push!(l, m.match) > >>> end > >>> > >>> note that: (1) the regexp is only matching one character, so the array > l > >>> is at most a million characters long. > >>> > >>> what i think is happening (but this is only a guess) is that s[i:end] > is > >>> being passed though to the c level regexp library as a new string. > the > >>> result (m.match) is then a substring into that. because the substring > is > >>> kept around, the backing string cannot be collected. and so there's > an n^2 > >>> memory use. > >>> > >>> ideally, i don't think a new copy of the string should be passed to > the > >>> regexp engine. maybe i am wrong? > >>> > >>> anyway, for now, if the above is right, i need some way to copy > m.match. > >>> as far as i can tell string() doesn't help. so what works? or am i > wrong? > >>> > >>> thanks, > >>> andrew >