On Tue, Jul 21, 2015 at 7:26 PM, andrew cooke <and...@acooke.org> wrote:
>
> ok, so match(regex, string, index) solves the problem.  presumably it exists
> exactly for this reason....?

At least I think this is a valid usecase.

>
> andrew
>
>
> On Tuesday, 21 July 2015 20:23:57 UTC-3, andrew cooke wrote:
>>
>>
>> hmm.  ignore that last statement (same problem).  still checking /
>> confused.  sorry.
>>
>> On Tuesday, 21 July 2015 20:20:46 UTC-3, andrew cooke wrote:
>>>
>>>
>>> i think that returns a substring (ir a view onto the backing string).

```
julia> typeof("aaa"[2:end])
ASCIIString

julia> SubString("aaa", 2, 3)
"aa"

julia> typeof(SubString("aaa", 2, 3))
SubString{ASCIIString}
```

>>> but i am not sure.  i did read a discussion somewhere saying that because of
>>> this you should use bytestring(...) before passing a string to c. which is
>>> all the evidence i have for my guess.
>>>
>>> incidentally, match(...) has a method that takes the offset to start at
>>> as an argument.  so i can avoid s[i:end] and just pass i into match (i just
>>> found this).
>>>
>>> however, somewhat surprisingly, it also has the same problem.
>>>
>>> andrew
>>>
>>>
>>> On Tuesday, 21 July 2015 20:15:58 UTC-3, Yichao Yu wrote:
>>>>
>>>> On Tue, Jul 21, 2015 at 7:08 PM, Jameson Nash <vtj...@gmail.com> wrote:
>>>> > does `copy` work? although `bytestring` also seems like a good method
>>>> > for
>>>> > this also. it seems wrong to me also that `match` is making a copy of
>>>> > the
>>>> > original string (if that is indeed what it is doing)
>>>>
>>>> Isn't it `s[i:end]` that is doing the copy?
>>>>
>>>> >
>>>> > On Tue, Jul 21, 2015 at 6:57 PM andrew cooke <and...@acooke.org>
>>>> > wrote:
>>>> >>
>>>> >>
>>>> >> string(bytestring(...)) seems to do it.  would appreciate any more
>>>> >> efficient solutions (and confirmation the analysis is correct - is
>>>> >> this
>>>> >> worth filing as an issue?)
>>>> >>
>>>> >>
>>>> >> On Tuesday, 21 July 2015 19:33:05 UTC-3, andrew cooke wrote:
>>>> >>>
>>>> >>>
>>>> >>> well, this was fun...  the following code rapidly triggers the OOM
>>>> >>> killer
>>>> >>> on my machine (julia 0.4 trunk):
>>>> >>>
>>>> >>> s = repeat("a", 1000000)
>>>> >>> l = Any[]
>>>> >>> r = r"^\w"
>>>> >>>
>>>> >>> for i in 1:length(s)
>>>> >>>     m = match(r, s[i:end])
>>>> >>>     push!(l, m.match)
>>>> >>> end
>>>> >>>
>>>> >>> note that: (1) the regexp is only matching one character, so the
>>>> >>> array l
>>>> >>> is at most a million characters long.
>>>> >>>
>>>> >>> what i think is happening (but this is only a guess) is that
>>>> >>> s[i:end] is
>>>> >>> being passed though to the c level regexp library as a new string.
>>>> >>> the
>>>> >>> result (m.match) is then a substring into that.  because the
>>>> >>> substring is
>>>> >>> kept around, the backing string cannot be collected.  and so there's
>>>> >>> an n^2
>>>> >>> memory use.
>>>> >>>
>>>> >>> ideally, i don't think a new copy of the string should be passed to
>>>> >>> the
>>>> >>> regexp engine.  maybe i am wrong?
>>>> >>>
>>>> >>> anyway, for now, if the above is right, i need some way to copy
>>>> >>> m.match.
>>>> >>> as far as i can tell string() doesn't help.  so what works?  or am i
>>>> >>> wrong?
>>>> >>>
>>>> >>> thanks,
>>>> >>> andrew

Reply via email to