ah. for some reason i was thinking they were invisible (somewhere below julia).
ok, thanks. so that explains things more clearly.... ...except that(!) using SubString(s, i, endof(s)) and passing *that* to match still gives the memory issue. so there's still something odd that i don't understand. maybe it's just that the regexp lib doesn't know about SubString. andrew On Tuesday, 21 July 2015 20:32:53 UTC-3, Yichao Yu wrote: > > On Tue, Jul 21, 2015 at 7:26 PM, andrew cooke <and...@acooke.org > <javascript:>> wrote: > > > > ok, so match(regex, string, index) solves the problem. presumably it > exists > > exactly for this reason....? > > At least I think this is a valid usecase. > > > > > andrew > > > > > > On Tuesday, 21 July 2015 20:23:57 UTC-3, andrew cooke wrote: > >> > >> > >> hmm. ignore that last statement (same problem). still checking / > >> confused. sorry. > >> > >> On Tuesday, 21 July 2015 20:20:46 UTC-3, andrew cooke wrote: > >>> > >>> > >>> i think that returns a substring (ir a view onto the backing string). > > ``` > julia> typeof("aaa"[2:end]) > ASCIIString > > julia> SubString("aaa", 2, 3) > "aa" > > julia> typeof(SubString("aaa", 2, 3)) > SubString{ASCIIString} > ``` > > >>> but i am not sure. i did read a discussion somewhere saying that > because of > >>> this you should use bytestring(...) before passing a string to c. > which is > >>> all the evidence i have for my guess. > >>> > >>> incidentally, match(...) has a method that takes the offset to start > at > >>> as an argument. so i can avoid s[i:end] and just pass i into match (i > just > >>> found this). > >>> > >>> however, somewhat surprisingly, it also has the same problem. > >>> > >>> andrew > >>> > >>> > >>> On Tuesday, 21 July 2015 20:15:58 UTC-3, Yichao Yu wrote: > >>>> > >>>> On Tue, Jul 21, 2015 at 7:08 PM, Jameson Nash <vtj...@gmail.com> > wrote: > >>>> > does `copy` work? although `bytestring` also seems like a good > method > >>>> > for > >>>> > this also. it seems wrong to me also that `match` is making a copy > of > >>>> > the > >>>> > original string (if that is indeed what it is doing) > >>>> > >>>> Isn't it `s[i:end]` that is doing the copy? > >>>> > >>>> > > >>>> > On Tue, Jul 21, 2015 at 6:57 PM andrew cooke <and...@acooke.org> > >>>> > wrote: > >>>> >> > >>>> >> > >>>> >> string(bytestring(...)) seems to do it. would appreciate any more > >>>> >> efficient solutions (and confirmation the analysis is correct - is > >>>> >> this > >>>> >> worth filing as an issue?) > >>>> >> > >>>> >> > >>>> >> On Tuesday, 21 July 2015 19:33:05 UTC-3, andrew cooke wrote: > >>>> >>> > >>>> >>> > >>>> >>> well, this was fun... the following code rapidly triggers the > OOM > >>>> >>> killer > >>>> >>> on my machine (julia 0.4 trunk): > >>>> >>> > >>>> >>> s = repeat("a", 1000000) > >>>> >>> l = Any[] > >>>> >>> r = r"^\w" > >>>> >>> > >>>> >>> for i in 1:length(s) > >>>> >>> m = match(r, s[i:end]) > >>>> >>> push!(l, m.match) > >>>> >>> end > >>>> >>> > >>>> >>> note that: (1) the regexp is only matching one character, so the > >>>> >>> array l > >>>> >>> is at most a million characters long. > >>>> >>> > >>>> >>> what i think is happening (but this is only a guess) is that > >>>> >>> s[i:end] is > >>>> >>> being passed though to the c level regexp library as a new > string. > >>>> >>> the > >>>> >>> result (m.match) is then a substring into that. because the > >>>> >>> substring is > >>>> >>> kept around, the backing string cannot be collected. and so > there's > >>>> >>> an n^2 > >>>> >>> memory use. > >>>> >>> > >>>> >>> ideally, i don't think a new copy of the string should be passed > to > >>>> >>> the > >>>> >>> regexp engine. maybe i am wrong? > >>>> >>> > >>>> >>> anyway, for now, if the above is right, i need some way to copy > >>>> >>> m.match. > >>>> >>> as far as i can tell string() doesn't help. so what works? or > am i > >>>> >>> wrong? > >>>> >>> > >>>> >>> thanks, > >>>> >>> andrew >