ah.  for some reason i was thinking they were invisible (somewhere below 
julia).

ok, thanks.  so that explains things more clearly....

...except that(!) using SubString(s, i, endof(s)) and passing *that* to 
match still gives the memory issue.

so there's still something odd that i don't understand.  maybe it's just 
that the regexp lib doesn't know about SubString.

andrew



On Tuesday, 21 July 2015 20:32:53 UTC-3, Yichao Yu wrote:
>
> On Tue, Jul 21, 2015 at 7:26 PM, andrew cooke <and...@acooke.org 
> <javascript:>> wrote: 
> > 
> > ok, so match(regex, string, index) solves the problem.  presumably it 
> exists 
> > exactly for this reason....? 
>
> At least I think this is a valid usecase. 
>
> > 
> > andrew 
> > 
> > 
> > On Tuesday, 21 July 2015 20:23:57 UTC-3, andrew cooke wrote: 
> >> 
> >> 
> >> hmm.  ignore that last statement (same problem).  still checking / 
> >> confused.  sorry. 
> >> 
> >> On Tuesday, 21 July 2015 20:20:46 UTC-3, andrew cooke wrote: 
> >>> 
> >>> 
> >>> i think that returns a substring (ir a view onto the backing string). 
>
> ``` 
> julia> typeof("aaa"[2:end]) 
> ASCIIString 
>
> julia> SubString("aaa", 2, 3) 
> "aa" 
>
> julia> typeof(SubString("aaa", 2, 3)) 
> SubString{ASCIIString} 
> ``` 
>
> >>> but i am not sure.  i did read a discussion somewhere saying that 
> because of 
> >>> this you should use bytestring(...) before passing a string to c. 
> which is 
> >>> all the evidence i have for my guess. 
> >>> 
> >>> incidentally, match(...) has a method that takes the offset to start 
> at 
> >>> as an argument.  so i can avoid s[i:end] and just pass i into match (i 
> just 
> >>> found this). 
> >>> 
> >>> however, somewhat surprisingly, it also has the same problem. 
> >>> 
> >>> andrew 
> >>> 
> >>> 
> >>> On Tuesday, 21 July 2015 20:15:58 UTC-3, Yichao Yu wrote: 
> >>>> 
> >>>> On Tue, Jul 21, 2015 at 7:08 PM, Jameson Nash <vtj...@gmail.com> 
> wrote: 
> >>>> > does `copy` work? although `bytestring` also seems like a good 
> method 
> >>>> > for 
> >>>> > this also. it seems wrong to me also that `match` is making a copy 
> of 
> >>>> > the 
> >>>> > original string (if that is indeed what it is doing) 
> >>>> 
> >>>> Isn't it `s[i:end]` that is doing the copy? 
> >>>> 
> >>>> > 
> >>>> > On Tue, Jul 21, 2015 at 6:57 PM andrew cooke <and...@acooke.org> 
> >>>> > wrote: 
> >>>> >> 
> >>>> >> 
> >>>> >> string(bytestring(...)) seems to do it.  would appreciate any more 
> >>>> >> efficient solutions (and confirmation the analysis is correct - is 
> >>>> >> this 
> >>>> >> worth filing as an issue?) 
> >>>> >> 
> >>>> >> 
> >>>> >> On Tuesday, 21 July 2015 19:33:05 UTC-3, andrew cooke wrote: 
> >>>> >>> 
> >>>> >>> 
> >>>> >>> well, this was fun...  the following code rapidly triggers the 
> OOM 
> >>>> >>> killer 
> >>>> >>> on my machine (julia 0.4 trunk): 
> >>>> >>> 
> >>>> >>> s = repeat("a", 1000000) 
> >>>> >>> l = Any[] 
> >>>> >>> r = r"^\w" 
> >>>> >>> 
> >>>> >>> for i in 1:length(s) 
> >>>> >>>     m = match(r, s[i:end]) 
> >>>> >>>     push!(l, m.match) 
> >>>> >>> end 
> >>>> >>> 
> >>>> >>> note that: (1) the regexp is only matching one character, so the 
> >>>> >>> array l 
> >>>> >>> is at most a million characters long. 
> >>>> >>> 
> >>>> >>> what i think is happening (but this is only a guess) is that 
> >>>> >>> s[i:end] is 
> >>>> >>> being passed though to the c level regexp library as a new 
> string. 
> >>>> >>> the 
> >>>> >>> result (m.match) is then a substring into that.  because the 
> >>>> >>> substring is 
> >>>> >>> kept around, the backing string cannot be collected.  and so 
> there's 
> >>>> >>> an n^2 
> >>>> >>> memory use. 
> >>>> >>> 
> >>>> >>> ideally, i don't think a new copy of the string should be passed 
> to 
> >>>> >>> the 
> >>>> >>> regexp engine.  maybe i am wrong? 
> >>>> >>> 
> >>>> >>> anyway, for now, if the above is right, i need some way to copy 
> >>>> >>> m.match. 
> >>>> >>> as far as i can tell string() doesn't help.  so what works?  or 
> am i 
> >>>> >>> wrong? 
> >>>> >>> 
> >>>> >>> thanks, 
> >>>> >>> andrew 
>

Reply via email to