ok, so match(regex, string, index) solves the problem. presumably it exists exactly for this reason....?
andrew On Tuesday, 21 July 2015 20:23:57 UTC-3, andrew cooke wrote: > > > hmm. ignore that last statement (same problem). still checking / > confused. sorry. > > On Tuesday, 21 July 2015 20:20:46 UTC-3, andrew cooke wrote: >> >> >> i think that returns a substring (ir a view onto the backing string). >> but i am not sure. i did read a discussion somewhere saying that because >> of this you should use bytestring(...) before passing a string to c. which >> is all the evidence i have for my guess. >> >> incidentally, match(...) has a method that takes the offset to start at >> as an argument. so i can avoid s[i:end] and just pass i into match (i just >> found this). >> >> however, somewhat surprisingly, it also has the same problem. >> >> andrew >> >> >> On Tuesday, 21 July 2015 20:15:58 UTC-3, Yichao Yu wrote: >>> >>> On Tue, Jul 21, 2015 at 7:08 PM, Jameson Nash <vtj...@gmail.com> wrote: >>> > does `copy` work? although `bytestring` also seems like a good method >>> for >>> > this also. it seems wrong to me also that `match` is making a copy of >>> the >>> > original string (if that is indeed what it is doing) >>> >>> Isn't it `s[i:end]` that is doing the copy? >>> >>> > >>> > On Tue, Jul 21, 2015 at 6:57 PM andrew cooke <and...@acooke.org> >>> wrote: >>> >> >>> >> >>> >> string(bytestring(...)) seems to do it. would appreciate any more >>> >> efficient solutions (and confirmation the analysis is correct - is >>> this >>> >> worth filing as an issue?) >>> >> >>> >> >>> >> On Tuesday, 21 July 2015 19:33:05 UTC-3, andrew cooke wrote: >>> >>> >>> >>> >>> >>> well, this was fun... the following code rapidly triggers the OOM >>> killer >>> >>> on my machine (julia 0.4 trunk): >>> >>> >>> >>> s = repeat("a", 1000000) >>> >>> l = Any[] >>> >>> r = r"^\w" >>> >>> >>> >>> for i in 1:length(s) >>> >>> m = match(r, s[i:end]) >>> >>> push!(l, m.match) >>> >>> end >>> >>> >>> >>> note that: (1) the regexp is only matching one character, so the >>> array l >>> >>> is at most a million characters long. >>> >>> >>> >>> what i think is happening (but this is only a guess) is that >>> s[i:end] is >>> >>> being passed though to the c level regexp library as a new string. >>> the >>> >>> result (m.match) is then a substring into that. because the >>> substring is >>> >>> kept around, the backing string cannot be collected. and so there's >>> an n^2 >>> >>> memory use. >>> >>> >>> >>> ideally, i don't think a new copy of the string should be passed to >>> the >>> >>> regexp engine. maybe i am wrong? >>> >>> >>> >>> anyway, for now, if the above is right, i need some way to copy >>> m.match. >>> >>> as far as i can tell string() doesn't help. so what works? or am i >>> wrong? >>> >>> >>> >>> thanks, >>> >>> andrew >>> >>