Fred, >> The code is more than 4 times slower than the original 'process 8 >> bytes using RISBG' variant.
Oups- >> memory accesses are killing you. I do everything in less than 1K (plus TR-tables read) so.... .... or the TR performance. After I had completed the code I did stumble over a some discussions here which had a stmt about the fact that TR is very very slow. Ed- can the HIS data help? -- Martin Pi_cap_CPU - all you ever need around MWLC/SCRT/CMT in z/VSE more at http://www.picapcpu.de
