I'm in the middle of profiling my code, and I noticed that I'm paying a
penalty for doing a lot of sub-string copies. For what I'm doing, I don't
actually need copies of the string, but rather just want to keep a pointer
to the string with the range the view occupies. I thought I'd write up a
quick test to see if I could speed things up (please ignore the horrible
names):
import Base: getindex, endof, substring
immutable StringView
value::String
first::Int64
last::Int64
end
immutable FastString <: String
value::String
end
getindex(s::FastString, r::UnitRange{Int64}) = StringView(s.value, r.start,
r.stop)
endof(s::FastString) = endof(s.value)
const size = 10000000
function teststring()
s = randstring(size)
for i=1:size-10
value = s[i:(i+10)]
end
end
function teststringview()
local s::FastString = FastString(randstring(size))
for i=1:size-10
value = s[i:(i+10)]
end
end
teststring()
@time teststring()
teststringview()
@time teststringview()
The results I get are:
elapsed time: 0.910467582 seconds (890006256 bytes allocated)
elapsed time: 0.392893967 seconds (409999912 bytes allocated)
The speed-up isn't incredible, but if you're doing a lot of text
processing, it might help.
I'd be curious if anyone had thoughts or better ways of doing this. Or for
that matter, reasons why this may be a bad idea.