Hi, I just received this mail and I think it should be postet here:
[Dirk, email bounced when I tried to send this to the lcms list, so I'm emailing you directly with my comments. Feel free to forward this to the list] > > > �we work with littleCMS and are very happy about the results. But besides > > > the color quality I'm interested in the performance. We measured a > > > calculation time of around 400 ms using a 1 Mega Pixel picture. Knowing Some time ago I did some performance testing and tuning on lcms. �My thoughts on the matter (all of this related to 16 bit in/out, device-link constructed 3D transformations at high quality): -�������The standard lcms 1.12 library takes about 450ns/pixel on a "typical" ��������PC from 2 years ago. -�������Very minor performance tuning for the common 3 in/3 out, 16 bit ��������in/out per channel situation enabled the library speed to be boosted ��������to 150ns/pixel, e.g. nearly 3x faster. �I believe this ��������figure to be as good as or better than any other CMS I tested, ��������as it results in about 7 Megapixels throughput for full 16 bit ��������data (e.g. 40 Megabytes throughput). -�������The tuning involved mainly: ��������a) using a fast IEEE to int conversion macro at a critical point ��������(standard IEEE to int performance is notoriously slow on Intel). ��������b) Unrolling the core 3D LUT interpolation routine. ��������In all less than 100 lines of code changes or so. -�������I did try lcms 1.14 and applied the above changes, but speed ��������was much slower (~400ns/pixel). I have yet to look into why this ��������is, however it should be easy enough to retune 1.14 to get 150ns/pixel ��������speed out of it. This ��������is why I have not applied the changes back to Marti for the library ��������(although I did send him the 1.12 changes a couple of years ago). -�������My view is that it is very hard to get below 100ns/pixel for ��������for a full 3D transformation, because the bottleneck becomes ��������memory access for the 3D LUT array. �The 3D LUT is too large ��������(for a high quality transformation) to fit into L2 let alone ��������L1 cache). With memory typically cycling at 50ns for a random ��������read, main memory is about 100x slower than the CPU these days ��������(e.g. you can do about 100 CPU instructions in the time it takes ��������to do one main-memory access). �This difference between ��������CPU and RAM speed is in my view the critical factor in ��������performance tuning these days. �It is not the CPU load ��������(e.g. instructions executed) that matters, but instead the ��������data flow load becomes the vital factor. -�������Given the above RAM/CPU disparity, it might be possible ��������to make improvements by moving away from a 3D LUT ��������device link transformations (which has a heavy RAM load), ��������to a description of the curve in a mathematical sense ��������(such as a piece-wise set of polynomials). �This would ��������enable everything to fit into main CPU cache, and could ��������potentially give throughput in the 10ns/pixel to 50ns/pixel range. ��������This also gives a greater performance boost on multi-core ��������machines (as the formulae can sit in L1 cache). ��������Although this is an area I've worked on for other ��������pipelines, I have not looked into this in much detail for ��������CMS work, as most of my effort currently is on spectral based ��������profiling & transformation (this is for digital cameras). -�������Speaking of which, it seems that recent products on the ��������market are doing major 'cheats' to get high speed. In ��������effect, they are bypassing true CMS work and doing ��������simple hacks (I think - I've not looked at their code). ��������In short, the emphasis now seems to be on speed not ��������on accuracy. �If you are trying to write products that ��������compete or compare with some of the photo editing products, ��������this is worth keeping in mind - many products now are ��������at best doing simple matrix transformations, and at ��������worst not even doing that. My $0.02 anyway. Regards, Stuart ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95&alloc_id396&op=click _______________________________________________ Lcms-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/lcms-user
