Some time ago I did some performance testing and tuning on lcms. My thoughts on the matter (all of this related to 16 bit in/out, device-link constructed 3D transformations at high quality):
... by the way, the code changes from Stuart Nixon are being incorporated. In the CVS there are some modifications already applied. Hopefully next version will hold all changes.
Regards, -- Marti Maria The littlecms project. www.littlecms.com
----- Original Message ----- From: "Dirk Str�ker" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Friday, March 25, 2005 4:20 PM
Subject: Re: [Lcms-user] CMM and Performance
Hi,
I just received this mail and I think it should be postet here:
[Dirk, email bounced when I tried to send this to the lcms list, so I'm emailing you directly with my comments. Feel free to forward this to the list]
> > we work with littleCMS and are very happy about the results. Butbesides> > the color quality I'm interested in the performance. We measured a
> > calculation time of around 400 ms using a 1 Mega Pixel picture. > > Knowing
Some time ago I did some performance testing and tuning on lcms. My thoughts on the matter (all of this related to 16 bit in/out, device-link constructed 3D transformations at high quality):
- The standard lcms 1.12 library takes about 450ns/pixel on a "typical" PC from 2 years ago.
- Very minor performance tuning for the common 3 in/3 out, 16 bit in/out per channel situation enabled the library speed to be boosted to 150ns/pixel, e.g. nearly 3x faster. I believe this figure to be as good as or better than any other CMS I tested, as it results in about 7 Megapixels throughput for full 16 bit data (e.g. 40 Megabytes throughput).
- The tuning involved mainly: a) using a fast IEEE to int conversion macro at a critical point (standard IEEE to int performance is notoriously slow on Intel). b) Unrolling the core 3D LUT interpolation routine. In all less than 100 lines of code changes or so.
- I did try lcms 1.14 and applied the above changes, but speed was much slower (~400ns/pixel). I have yet to look into why this is, however it should be easy enough to retune 1.14 to get 150ns/pixel speed out of it. This is why I have not applied the changes back to Marti for the library (although I did send him the 1.12 changes a couple of years ago).
- My view is that it is very hard to get below 100ns/pixel for for a full 3D transformation, because the bottleneck becomes memory access for the 3D LUT array. The 3D LUT is too large (for a high quality transformation) to fit into L2 let alone L1 cache). With memory typically cycling at 50ns for a random read, main memory is about 100x slower than the CPU these days (e.g. you can do about 100 CPU instructions in the time it takes to do one main-memory access). This difference between CPU and RAM speed is in my view the critical factor in performance tuning these days. It is not the CPU load (e.g. instructions executed) that matters, but instead the data flow load becomes the vital factor.
- Given the above RAM/CPU disparity, it might be possible to make improvements by moving away from a 3D LUT device link transformations (which has a heavy RAM load), to a description of the curve in a mathematical sense (such as a piece-wise set of polynomials). This would enable everything to fit into main CPU cache, and could potentially give throughput in the 10ns/pixel to 50ns/pixel range. This also gives a greater performance boost on multi-core machines (as the formulae can sit in L1 cache). Although this is an area I've worked on for other pipelines, I have not looked into this in much detail for CMS work, as most of my effort currently is on spectral based profiling & transformation (this is for digital cameras).
- Speaking of which, it seems that recent products on the market are doing major 'cheats' to get high speed. In effect, they are bypassing true CMS work and doing simple hacks (I think - I've not looked at their code). In short, the emphasis now seems to be on speed not on accuracy. If you are trying to write products that compete or compare with some of the photo editing products, this is worth keeping in mind - many products now are at best doing simple matrix transformations, and at worst not even doing that.
My $0.02 anyway.
Regards,
Stuart
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95&alloc_id396&op�k _______________________________________________ Lcms-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/lcms-user
-- No virus found in this incoming message. Checked by AVG Anti-Virus. Version: 7.0.308 / Virus Database: 266.8.1 - Release Date: 23/03/2005
-- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.308 / Virus Database: 266.8.3 - Release Date: 25/03/2005
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95&alloc_id396&op=click _______________________________________________ Lcms-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/lcms-user
