> However, I recommend hacking in libz first, making it work with > gzip, and staert porting it to the kernel next step. Debugging > and benchmarking in userspace is *so* much easier. > Ah, I tried it. Unfortunately the zlib in the kernel is a heavily modified zlib and I was not able to compile it in user space. So I have learnt writing kernel modules (took just 2 days, it is so simple compared to windows drivers that I still cannot believe it). And there is a comment in the kernel zlib that user-space support was removed... > Also, I'd be very surprised if such an obvious optimization > hadn't been tried already in 20+ years of gzip. Try digging > around: you may find that it's not worth it. > The optimization that I thought of is absolutely Geode specific. First it needs some prefetch, and secondly it has a lot of branches. The Geode has a very simple 1 bit branch predictor (it seems like that but not documented) so it can waste 20-40 cycles for every run (every length/distance code). I know that it is hard to create better code than a C compiler nowadays so I am sure that simply rewriting the code in asm would not speed things up (>5 years ago LZO had several asm implementations for 486/586/686 but ironically all were slower than the compiler generated one.)
Now that you have mentioned that jffs2 uses only 4K blocks, it can be possible that the bottleneck is not in inffast.c. Do you have ANY perf/profile data, please? All I would like to know whether the bottleneck lies in inffast or not: /* When large enough input and output buffers are supplied to inflate(), for example, a 16K input buffer and a 64K output buffer, more than 95% of the inflate execution time is spent in this routine. */ _______________________________________________ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel