On 15.08.2011 00:21, Levente Uzonyi wrote:
> On Sun, 14 Aug 2011, Philippe Marschall wrote:
> 
>> On 14.08.2011 22:00, Levente Uzonyi wrote:
>>> On Sun, 14 Aug 2011, Philippe Marschall wrote:
>>>
>>>> Hi
>>>>
>>>> In Seaside we get a lot of performance gains out of
>>>> primitiveFindFirstInString. One thing that always annoyed me a bit is
>>>> that it's not as optimized as it could be.
>>>>
>>>> The inclusionMap you give it are 256 consecutive boolean values (0 or
>>>> 1). There is no need for this to be a 256 element ByteArray when each
>>>> element can only be 0 or 1. We could as well make it a 32 element
>>>> ByteArray and each byte holding eight bit values. Instead of using the
>>>> asciiValue to directly index into the inclusionMap we would use the top
>>>> five bits to index into the inclusionMap and the bottom 3 bits to
>>>> "index
>>>> into the byte".
>>>>
>>>> Did that make any sense?
>>>
>>> Do you want to save space?
>>
>> I'm trying to trade memory access (which is slow) for a bit shift and
>> two bit ands (which is fast).
> 
> 256 bytes + object header easily fit into the L1 cache of any recent x86
> CPU, even the old P4's had 8kB of it.

Sure once it's in the cache then it's in the cache. But a cache line is
only 64 bytes. So a 32 element array should fit and a 256 element
doesn't. So a 32 element array should be there in one memory access
while a 256 element array probably takes five.

> Moving less bytes around may make
> some difference though, but if you create your ByteArrays right before
> you use them, then it's pretty likely that your object is already in the
> cache.

I don't of course.

>>
>>> Are you storing lots of inclusion maps (maybe
>>> CharacterSets)? If not, then IMHO it's not worth to adding this feature
>>> to this primitive, because runtime performance will be worse on both the
>>> image side and the VM side.
>>
>> Why?
> 
> Filling the contents of a compact 32 sized ByteArray in the image takes
> more time than a 256 byte ByteArray. On the VM side memory access +
> shifts + bitands cost more than just memory access.

Uhm, it's a primitive. My understanding is this would result in "C
shifts + bitands".

> But (as usual) a
> benchmark can tell if it's really worth to do it or not.

This there somewhere some documentation how one adds a primitive and
compiles a VM (Cog ideally)?

Cheers
Philippe


Reply via email to