I've been using ascii().

On Tuesday, July 21, 2015 at 7:38:28 PM UTC-4, andrew cooke wrote:
>
>
> ah.  for some reason i was thinking they were invisible (somewhere below 
> julia).
>
> ok, thanks.  so that explains things more clearly....
>
> ...except that(!) using SubString(s, i, endof(s)) and passing *that* to 
> match still gives the memory issue.
>
> so there's still something odd that i don't understand.  maybe it's just 
> that the regexp lib doesn't know about SubString.
>
> andrew
>
>
>
> On Tuesday, 21 July 2015 20:32:53 UTC-3, Yichao Yu wrote:
>>
>> On Tue, Jul 21, 2015 at 7:26 PM, andrew cooke <and...@acooke.org> wrote: 
>> > 
>> > ok, so match(regex, string, index) solves the problem.  presumably it 
>> exists 
>> > exactly for this reason....? 
>>
>> At least I think this is a valid usecase. 
>>
>> > 
>> > andrew 
>> > 
>> > 
>> > On Tuesday, 21 July 2015 20:23:57 UTC-3, andrew cooke wrote: 
>> >> 
>> >> 
>> >> hmm.  ignore that last statement (same problem).  still checking / 
>> >> confused.  sorry. 
>> >> 
>> >> On Tuesday, 21 July 2015 20:20:46 UTC-3, andrew cooke wrote: 
>> >>> 
>> >>> 
>> >>> i think that returns a substring (ir a view onto the backing string). 
>>
>> ``` 
>> julia> typeof("aaa"[2:end]) 
>> ASCIIString 
>>
>> julia> SubString("aaa", 2, 3) 
>> "aa" 
>>
>> julia> typeof(SubString("aaa", 2, 3)) 
>> SubString{ASCIIString} 
>> ``` 
>>
>> >>> but i am not sure.  i did read a discussion somewhere saying that 
>> because of 
>> >>> this you should use bytestring(...) before passing a string to c. 
>> which is 
>> >>> all the evidence i have for my guess. 
>> >>> 
>> >>> incidentally, match(...) has a method that takes the offset to start 
>> at 
>> >>> as an argument.  so i can avoid s[i:end] and just pass i into match 
>> (i just 
>> >>> found this). 
>> >>> 
>> >>> however, somewhat surprisingly, it also has the same problem. 
>> >>> 
>> >>> andrew 
>> >>> 
>> >>> 
>> >>> On Tuesday, 21 July 2015 20:15:58 UTC-3, Yichao Yu wrote: 
>> >>>> 
>> >>>> On Tue, Jul 21, 2015 at 7:08 PM, Jameson Nash <vtj...@gmail.com> 
>> wrote: 
>> >>>> > does `copy` work? although `bytestring` also seems like a good 
>> method 
>> >>>> > for 
>> >>>> > this also. it seems wrong to me also that `match` is making a copy 
>> of 
>> >>>> > the 
>> >>>> > original string (if that is indeed what it is doing) 
>> >>>> 
>> >>>> Isn't it `s[i:end]` that is doing the copy? 
>> >>>> 
>> >>>> > 
>> >>>> > On Tue, Jul 21, 2015 at 6:57 PM andrew cooke <and...@acooke.org> 
>> >>>> > wrote: 
>> >>>> >> 
>> >>>> >> 
>> >>>> >> string(bytestring(...)) seems to do it.  would appreciate any 
>> more 
>> >>>> >> efficient solutions (and confirmation the analysis is correct - 
>> is 
>> >>>> >> this 
>> >>>> >> worth filing as an issue?) 
>> >>>> >> 
>> >>>> >> 
>> >>>> >> On Tuesday, 21 July 2015 19:33:05 UTC-3, andrew cooke wrote: 
>> >>>> >>> 
>> >>>> >>> 
>> >>>> >>> well, this was fun...  the following code rapidly triggers the 
>> OOM 
>> >>>> >>> killer 
>> >>>> >>> on my machine (julia 0.4 trunk): 
>> >>>> >>> 
>> >>>> >>> s = repeat("a", 1000000) 
>> >>>> >>> l = Any[] 
>> >>>> >>> r = r"^\w" 
>> >>>> >>> 
>> >>>> >>> for i in 1:length(s) 
>> >>>> >>>     m = match(r, s[i:end]) 
>> >>>> >>>     push!(l, m.match) 
>> >>>> >>> end 
>> >>>> >>> 
>> >>>> >>> note that: (1) the regexp is only matching one character, so the 
>> >>>> >>> array l 
>> >>>> >>> is at most a million characters long. 
>> >>>> >>> 
>> >>>> >>> what i think is happening (but this is only a guess) is that 
>> >>>> >>> s[i:end] is 
>> >>>> >>> being passed though to the c level regexp library as a new 
>> string. 
>> >>>> >>> the 
>> >>>> >>> result (m.match) is then a substring into that.  because the 
>> >>>> >>> substring is 
>> >>>> >>> kept around, the backing string cannot be collected.  and so 
>> there's 
>> >>>> >>> an n^2 
>> >>>> >>> memory use. 
>> >>>> >>> 
>> >>>> >>> ideally, i don't think a new copy of the string should be passed 
>> to 
>> >>>> >>> the 
>> >>>> >>> regexp engine.  maybe i am wrong? 
>> >>>> >>> 
>> >>>> >>> anyway, for now, if the above is right, i need some way to copy 
>> >>>> >>> m.match. 
>> >>>> >>> as far as i can tell string() doesn't help.  so what works?  or 
>> am i 
>> >>>> >>> wrong? 
>> >>>> >>> 
>> >>>> >>> thanks, 
>> >>>> >>> andrew 
>>
>

Reply via email to