ok, so match(regex, string, index) solves the problem.  presumably it 
exists exactly for this reason....?

andrew

On Tuesday, 21 July 2015 20:23:57 UTC-3, andrew cooke wrote:
>
>
> hmm.  ignore that last statement (same problem).  still checking / 
> confused.  sorry.
>
> On Tuesday, 21 July 2015 20:20:46 UTC-3, andrew cooke wrote:
>>
>>
>> i think that returns a substring (ir a view onto the backing string).  
>> but i am not sure.  i did read a discussion somewhere saying that because 
>> of this you should use bytestring(...) before passing a string to c. which 
>> is all the evidence i have for my guess.
>>
>> incidentally, match(...) has a method that takes the offset to start at 
>> as an argument.  so i can avoid s[i:end] and just pass i into match (i just 
>> found this).
>>
>> however, somewhat surprisingly, it also has the same problem.
>>
>> andrew
>>
>>
>> On Tuesday, 21 July 2015 20:15:58 UTC-3, Yichao Yu wrote:
>>>
>>> On Tue, Jul 21, 2015 at 7:08 PM, Jameson Nash <vtj...@gmail.com> wrote: 
>>> > does `copy` work? although `bytestring` also seems like a good method 
>>> for 
>>> > this also. it seems wrong to me also that `match` is making a copy of 
>>> the 
>>> > original string (if that is indeed what it is doing) 
>>>
>>> Isn't it `s[i:end]` that is doing the copy? 
>>>
>>> > 
>>> > On Tue, Jul 21, 2015 at 6:57 PM andrew cooke <and...@acooke.org> 
>>> wrote: 
>>> >> 
>>> >> 
>>> >> string(bytestring(...)) seems to do it.  would appreciate any more 
>>> >> efficient solutions (and confirmation the analysis is correct - is 
>>> this 
>>> >> worth filing as an issue?) 
>>> >> 
>>> >> 
>>> >> On Tuesday, 21 July 2015 19:33:05 UTC-3, andrew cooke wrote: 
>>> >>> 
>>> >>> 
>>> >>> well, this was fun...  the following code rapidly triggers the OOM 
>>> killer 
>>> >>> on my machine (julia 0.4 trunk): 
>>> >>> 
>>> >>> s = repeat("a", 1000000) 
>>> >>> l = Any[] 
>>> >>> r = r"^\w" 
>>> >>> 
>>> >>> for i in 1:length(s) 
>>> >>>     m = match(r, s[i:end]) 
>>> >>>     push!(l, m.match) 
>>> >>> end 
>>> >>> 
>>> >>> note that: (1) the regexp is only matching one character, so the 
>>> array l 
>>> >>> is at most a million characters long. 
>>> >>> 
>>> >>> what i think is happening (but this is only a guess) is that 
>>> s[i:end] is 
>>> >>> being passed though to the c level regexp library as a new string. 
>>>  the 
>>> >>> result (m.match) is then a substring into that.  because the 
>>> substring is 
>>> >>> kept around, the backing string cannot be collected.  and so there's 
>>> an n^2 
>>> >>> memory use. 
>>> >>> 
>>> >>> ideally, i don't think a new copy of the string should be passed to 
>>> the 
>>> >>> regexp engine.  maybe i am wrong? 
>>> >>> 
>>> >>> anyway, for now, if the above is right, i need some way to copy 
>>> m.match. 
>>> >>> as far as i can tell string() doesn't help.  so what works?  or am i 
>>> wrong? 
>>> >>> 
>>> >>> thanks, 
>>> >>> andrew 
>>>
>>

Reply via email to