hmm.  ignore that last statement (same problem).  still checking / 
confused.  sorry.

On Tuesday, 21 July 2015 20:20:46 UTC-3, andrew cooke wrote:
>
>
> i think that returns a substring (ir a view onto the backing string).  but 
> i am not sure.  i did read a discussion somewhere saying that because of 
> this you should use bytestring(...) before passing a string to c. which is 
> all the evidence i have for my guess.
>
> incidentally, match(...) has a method that takes the offset to start at as 
> an argument.  so i can avoid s[i:end] and just pass i into match (i just 
> found this).
>
> however, somewhat surprisingly, it also has the same problem.
>
> andrew
>
>
> On Tuesday, 21 July 2015 20:15:58 UTC-3, Yichao Yu wrote:
>>
>> On Tue, Jul 21, 2015 at 7:08 PM, Jameson Nash <vtj...@gmail.com> wrote: 
>> > does `copy` work? although `bytestring` also seems like a good method 
>> for 
>> > this also. it seems wrong to me also that `match` is making a copy of 
>> the 
>> > original string (if that is indeed what it is doing) 
>>
>> Isn't it `s[i:end]` that is doing the copy? 
>>
>> > 
>> > On Tue, Jul 21, 2015 at 6:57 PM andrew cooke <and...@acooke.org> 
>> wrote: 
>> >> 
>> >> 
>> >> string(bytestring(...)) seems to do it.  would appreciate any more 
>> >> efficient solutions (and confirmation the analysis is correct - is 
>> this 
>> >> worth filing as an issue?) 
>> >> 
>> >> 
>> >> On Tuesday, 21 July 2015 19:33:05 UTC-3, andrew cooke wrote: 
>> >>> 
>> >>> 
>> >>> well, this was fun...  the following code rapidly triggers the OOM 
>> killer 
>> >>> on my machine (julia 0.4 trunk): 
>> >>> 
>> >>> s = repeat("a", 1000000) 
>> >>> l = Any[] 
>> >>> r = r"^\w" 
>> >>> 
>> >>> for i in 1:length(s) 
>> >>>     m = match(r, s[i:end]) 
>> >>>     push!(l, m.match) 
>> >>> end 
>> >>> 
>> >>> note that: (1) the regexp is only matching one character, so the 
>> array l 
>> >>> is at most a million characters long. 
>> >>> 
>> >>> what i think is happening (but this is only a guess) is that s[i:end] 
>> is 
>> >>> being passed though to the c level regexp library as a new string. 
>>  the 
>> >>> result (m.match) is then a substring into that.  because the 
>> substring is 
>> >>> kept around, the backing string cannot be collected.  and so there's 
>> an n^2 
>> >>> memory use. 
>> >>> 
>> >>> ideally, i don't think a new copy of the string should be passed to 
>> the 
>> >>> regexp engine.  maybe i am wrong? 
>> >>> 
>> >>> anyway, for now, if the above is right, i need some way to copy 
>> m.match. 
>> >>> as far as i can tell string() doesn't help.  so what works?  or am i 
>> wrong? 
>> >>> 
>> >>> thanks, 
>> >>> andrew 
>>
>

Reply via email to