I've been using ascii().
On Tuesday, July 21, 2015 at 7:38:28 PM UTC-4, andrew cooke wrote: > > > ah. for some reason i was thinking they were invisible (somewhere below > julia). > > ok, thanks. so that explains things more clearly.... > > ...except that(!) using SubString(s, i, endof(s)) and passing *that* to > match still gives the memory issue. > > so there's still something odd that i don't understand. maybe it's just > that the regexp lib doesn't know about SubString. > > andrew > > > > On Tuesday, 21 July 2015 20:32:53 UTC-3, Yichao Yu wrote: >> >> On Tue, Jul 21, 2015 at 7:26 PM, andrew cooke <and...@acooke.org> wrote: >> > >> > ok, so match(regex, string, index) solves the problem. presumably it >> exists >> > exactly for this reason....? >> >> At least I think this is a valid usecase. >> >> > >> > andrew >> > >> > >> > On Tuesday, 21 July 2015 20:23:57 UTC-3, andrew cooke wrote: >> >> >> >> >> >> hmm. ignore that last statement (same problem). still checking / >> >> confused. sorry. >> >> >> >> On Tuesday, 21 July 2015 20:20:46 UTC-3, andrew cooke wrote: >> >>> >> >>> >> >>> i think that returns a substring (ir a view onto the backing string). >> >> ``` >> julia> typeof("aaa"[2:end]) >> ASCIIString >> >> julia> SubString("aaa", 2, 3) >> "aa" >> >> julia> typeof(SubString("aaa", 2, 3)) >> SubString{ASCIIString} >> ``` >> >> >>> but i am not sure. i did read a discussion somewhere saying that >> because of >> >>> this you should use bytestring(...) before passing a string to c. >> which is >> >>> all the evidence i have for my guess. >> >>> >> >>> incidentally, match(...) has a method that takes the offset to start >> at >> >>> as an argument. so i can avoid s[i:end] and just pass i into match >> (i just >> >>> found this). >> >>> >> >>> however, somewhat surprisingly, it also has the same problem. >> >>> >> >>> andrew >> >>> >> >>> >> >>> On Tuesday, 21 July 2015 20:15:58 UTC-3, Yichao Yu wrote: >> >>>> >> >>>> On Tue, Jul 21, 2015 at 7:08 PM, Jameson Nash <vtj...@gmail.com> >> wrote: >> >>>> > does `copy` work? although `bytestring` also seems like a good >> method >> >>>> > for >> >>>> > this also. it seems wrong to me also that `match` is making a copy >> of >> >>>> > the >> >>>> > original string (if that is indeed what it is doing) >> >>>> >> >>>> Isn't it `s[i:end]` that is doing the copy? >> >>>> >> >>>> > >> >>>> > On Tue, Jul 21, 2015 at 6:57 PM andrew cooke <and...@acooke.org> >> >>>> > wrote: >> >>>> >> >> >>>> >> >> >>>> >> string(bytestring(...)) seems to do it. would appreciate any >> more >> >>>> >> efficient solutions (and confirmation the analysis is correct - >> is >> >>>> >> this >> >>>> >> worth filing as an issue?) >> >>>> >> >> >>>> >> >> >>>> >> On Tuesday, 21 July 2015 19:33:05 UTC-3, andrew cooke wrote: >> >>>> >>> >> >>>> >>> >> >>>> >>> well, this was fun... the following code rapidly triggers the >> OOM >> >>>> >>> killer >> >>>> >>> on my machine (julia 0.4 trunk): >> >>>> >>> >> >>>> >>> s = repeat("a", 1000000) >> >>>> >>> l = Any[] >> >>>> >>> r = r"^\w" >> >>>> >>> >> >>>> >>> for i in 1:length(s) >> >>>> >>> m = match(r, s[i:end]) >> >>>> >>> push!(l, m.match) >> >>>> >>> end >> >>>> >>> >> >>>> >>> note that: (1) the regexp is only matching one character, so the >> >>>> >>> array l >> >>>> >>> is at most a million characters long. >> >>>> >>> >> >>>> >>> what i think is happening (but this is only a guess) is that >> >>>> >>> s[i:end] is >> >>>> >>> being passed though to the c level regexp library as a new >> string. >> >>>> >>> the >> >>>> >>> result (m.match) is then a substring into that. because the >> >>>> >>> substring is >> >>>> >>> kept around, the backing string cannot be collected. and so >> there's >> >>>> >>> an n^2 >> >>>> >>> memory use. >> >>>> >>> >> >>>> >>> ideally, i don't think a new copy of the string should be passed >> to >> >>>> >>> the >> >>>> >>> regexp engine. maybe i am wrong? >> >>>> >>> >> >>>> >>> anyway, for now, if the above is right, i need some way to copy >> >>>> >>> m.match. >> >>>> >>> as far as i can tell string() doesn't help. so what works? or >> am i >> >>>> >>> wrong? >> >>>> >>> >> >>>> >>> thanks, >> >>>> >>> andrew >> >