Inspired by a conversation a while ago, I invented a byte stuffing algorithm that has both the expected optimal 0.2% overhead _and_ is prefix preserving, meaning values that are close to each other in radix space before encoding stay close together afterwords. The exact bytes may change, but long runs of the same character on the input remain long runs on the output. It also requires no lookahead so can be done on the fly during the JudySL lookup process.
The motivation was partiallly COBS, which is a good byte stuffing algorithm but requires a 256 byte buffer, making it inpractical on uCs and the lookahead means that JudySL performance is killed. 444444409 and 444444410 end up begining with 844... and 9444.. respectively meaning none of the shared run of 4s can be shared when they should take up adjacent bits in a leaf tree. I call it cuckoo stuffing, as it is partially inspired by the cuckoo hashing algorithm, where a hash collision "ejects" the previous value which then has to be rehashed, with cuckoo stuffing an escape character occuring in the stream ejects the escape character causing a new one to be selected via a synchronized PRNG. details are here http://notanumber.net/archives/183/cuckoo-byte-stuffing-algorithm It also is very fast and lean, 7 bytes of memory overhead and implementable in less than a hundred cpu instructions. John -- John Meacham - http://notanumber.net/ ------------------------------------------------------------------------------ "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available Simple to use. Nothing to install. Get started now for free." http://p.sf.net/sfu/SauceLabs _______________________________________________ Judy-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/judy-devel
