i think that returns a substring (ir a view onto the backing string).  but 
i am not sure.  i did read a discussion somewhere saying that because of 
this you should use bytestring(...) before passing a string to c. which is 
all the evidence i have for my guess.

incidentally, match(...) has a method that takes the offset to start at as 
an argument.  so i can avoid s[i:end] and just pass i into match (i just 
found this).

however, somewhat surprisingly, it also has the same problem.

andrew


On Tuesday, 21 July 2015 20:15:58 UTC-3, Yichao Yu wrote:
>
> On Tue, Jul 21, 2015 at 7:08 PM, Jameson Nash <vtj...@gmail.com 
> <javascript:>> wrote: 
> > does `copy` work? although `bytestring` also seems like a good method 
> for 
> > this also. it seems wrong to me also that `match` is making a copy of 
> the 
> > original string (if that is indeed what it is doing) 
>
> Isn't it `s[i:end]` that is doing the copy? 
>
> > 
> > On Tue, Jul 21, 2015 at 6:57 PM andrew cooke <and...@acooke.org 
> <javascript:>> wrote: 
> >> 
> >> 
> >> string(bytestring(...)) seems to do it.  would appreciate any more 
> >> efficient solutions (and confirmation the analysis is correct - is this 
> >> worth filing as an issue?) 
> >> 
> >> 
> >> On Tuesday, 21 July 2015 19:33:05 UTC-3, andrew cooke wrote: 
> >>> 
> >>> 
> >>> well, this was fun...  the following code rapidly triggers the OOM 
> killer 
> >>> on my machine (julia 0.4 trunk): 
> >>> 
> >>> s = repeat("a", 1000000) 
> >>> l = Any[] 
> >>> r = r"^\w" 
> >>> 
> >>> for i in 1:length(s) 
> >>>     m = match(r, s[i:end]) 
> >>>     push!(l, m.match) 
> >>> end 
> >>> 
> >>> note that: (1) the regexp is only matching one character, so the array 
> l 
> >>> is at most a million characters long. 
> >>> 
> >>> what i think is happening (but this is only a guess) is that s[i:end] 
> is 
> >>> being passed though to the c level regexp library as a new string. 
>  the 
> >>> result (m.match) is then a substring into that.  because the substring 
> is 
> >>> kept around, the backing string cannot be collected.  and so there's 
> an n^2 
> >>> memory use. 
> >>> 
> >>> ideally, i don't think a new copy of the string should be passed to 
> the 
> >>> regexp engine.  maybe i am wrong? 
> >>> 
> >>> anyway, for now, if the above is right, i need some way to copy 
> m.match. 
> >>> as far as i can tell string() doesn't help.  so what works?  or am i 
> wrong? 
> >>> 
> >>> thanks, 
> >>> andrew 
>

Reply via email to