Re: [Lcms-user] CMM and Performance

Dirk Str嚙糊er Fri, 25 Mar 2005 07:21:30 -0800

Hi,

 I just received this mail and I think it should be postet here:

[Dirk, email bounced when I tried to send this to the lcms list, so I'm
emailing
you directly with my comments. Feel free to forward this to the list]

> > > 嚙緩e work with littleCMS and are very happy about the results. But
besides
> > > the color quality I'm interested in the performance. We measured a
> > > calculation time of around 400 ms using a 1 Mega Pixel picture. Knowing

Some time ago I did some performance testing and tuning
on lcms. 嚙瞎y thoughts on the matter (all of this related to
16 bit in/out, device-link constructed 3D transformations at
high quality):

-嚙踝蕭嚙踝蕭嚙踝蕭嚙確he standard lcms 1.12 library takes about 450ns/pixel on a "typical"
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭PC from 2 years ago.

-嚙踝蕭嚙踝蕭嚙踝蕭嚙碾ery minor performance tuning for the common 3 in/3 out, 16 bit
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭in/out per channel situation enabled the library speed to be boosted
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭to 150ns/pixel, e.g. nearly 3x faster. 嚙瘢 believe this
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭figure to be as good as or better than any other CMS I tested,
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭as it results in about 7 Megapixels throughput for full 16 bit
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭data (e.g. 40 Megabytes throughput).

-嚙踝蕭嚙踝蕭嚙踝蕭嚙確he tuning involved mainly:
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭a) using a fast IEEE to int conversion macro at a critical point
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭(standard IEEE to int performance is notoriously slow on Intel).
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭b) Unrolling the core 3D LUT interpolation routine.
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭In all less than 100 lines of code changes or so.

-嚙踝蕭嚙踝蕭嚙踝蕭嚙瘢 did try lcms 1.14 and applied the above changes, but speed
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭was much slower (~400ns/pixel). I have yet to look into why this
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭is, however it should be easy enough to retune 1.14 to get 150ns/pixel
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭speed out of it. This
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭is why I have not applied the changes back to Marti for the library
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭(although I did send him the 1.12 changes a couple of years ago).

-嚙踝蕭嚙踝蕭嚙踝蕭嚙瞎y view is that it is very hard to get below 100ns/pixel for
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭for a full 3D transformation, because the bottleneck becomes
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭memory access for the 3D LUT array. 嚙確he 3D LUT is too large
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭(for a high quality transformation) to fit into L2 let alone
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭L1 cache). With memory typically cycling at 50ns for a random
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭read, main memory is about 100x slower than the CPU these days
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭(e.g. you can do about 100 CPU instructions in the time it takes
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭to do one main-memory access). 嚙確his difference between
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭CPU and RAM speed is in my view the critical factor in
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭performance tuning these days. 嚙瘢t is not the CPU load
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭(e.g. instructions executed) that matters, but instead the
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭data flow load becomes the vital factor.

-嚙踝蕭嚙踝蕭嚙踝蕭嚙瘦iven the above RAM/CPU disparity, it might be possible
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭to make improvements by moving away from a 3D LUT
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭device link transformations (which has a heavy RAM load),
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭to a description of the curve in a mathematical sense
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭(such as a piece-wise set of polynomials). 嚙確his would
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭enable everything to fit into main CPU cache, and could
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭potentially give throughput in the 10ns/pixel to 50ns/pixel range.
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭This also gives a greater performance boost on multi-core
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭machines (as the formulae can sit in L1 cache).
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭Although this is an area I've worked on for other
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭pipelines, I have not looked into this in much detail for
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭CMS work, as most of my effort currently is on spectral based
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭profiling & transformation (this is for digital cameras).

-嚙踝蕭嚙踝蕭嚙踝蕭嚙磅peaking of which, it seems that recent products on the
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭market are doing major 'cheats' to get high speed. In
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭effect, they are bypassing true CMS work and doing
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭simple hacks (I think - I've not looked at their code).
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭In short, the emphasis now seems to be on speed not
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭on accuracy. 嚙瘢f you are trying to write products that
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭compete or compare with some of the photo editing products,
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭this is worth keeping in mind - many products now are
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭at best doing simple matrix transformations, and at
嚙踝蕭嚙踝蕭嚙踝蕭嚙踝蕭worst not even doing that.

My $0.02 anyway.

Regards,

Stuart

-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id396&op=click
_______________________________________________
Lcms-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/lcms-user

Re: [Lcms-user] CMM and Performance

Reply via email to