On 15.08.2011 00:21, Levente Uzonyi wrote: > On Sun, 14 Aug 2011, Philippe Marschall wrote: > >> On 14.08.2011 22:00, Levente Uzonyi wrote: >>> On Sun, 14 Aug 2011, Philippe Marschall wrote: >>> >>>> Hi >>>> >>>> In Seaside we get a lot of performance gains out of >>>> primitiveFindFirstInString. One thing that always annoyed me a bit is >>>> that it's not as optimized as it could be. >>>> >>>> The inclusionMap you give it are 256 consecutive boolean values (0 or >>>> 1). There is no need for this to be a 256 element ByteArray when each >>>> element can only be 0 or 1. We could as well make it a 32 element >>>> ByteArray and each byte holding eight bit values. Instead of using the >>>> asciiValue to directly index into the inclusionMap we would use the top >>>> five bits to index into the inclusionMap and the bottom 3 bits to >>>> "index >>>> into the byte". >>>> >>>> Did that make any sense? >>> >>> Do you want to save space? >> >> I'm trying to trade memory access (which is slow) for a bit shift and >> two bit ands (which is fast). > > 256 bytes + object header easily fit into the L1 cache of any recent x86 > CPU, even the old P4's had 8kB of it.
Sure once it's in the cache then it's in the cache. But a cache line is only 64 bytes. So a 32 element array should fit and a 256 element doesn't. So a 32 element array should be there in one memory access while a 256 element array probably takes five. > Moving less bytes around may make > some difference though, but if you create your ByteArrays right before > you use them, then it's pretty likely that your object is already in the > cache. I don't of course. >> >>> Are you storing lots of inclusion maps (maybe >>> CharacterSets)? If not, then IMHO it's not worth to adding this feature >>> to this primitive, because runtime performance will be worse on both the >>> image side and the VM side. >> >> Why? > > Filling the contents of a compact 32 sized ByteArray in the image takes > more time than a 256 byte ByteArray. On the VM side memory access + > shifts + bitands cost more than just memory access. Uhm, it's a primitive. My understanding is this would result in "C shifts + bitands". > But (as usual) a > benchmark can tell if it's really worth to do it or not. This there somewhere some documentation how one adds a primitive and compiles a VM (Cog ideally)? Cheers Philippe