Will Fiveash wrote: > Recently I modified Solaris Kerberos so it used AES CBC mode from the > user and kernel space CF. While doing this work Gary Morton and I > determined that when Kerberos was using AES ECB mode with the Niagara II > crypto provider (n2cp) performance was much worse than with the n2cp AES > ECB mech disabled so that the AES software crypto provider was used. > This makes sense because there is setup overhead when the n2cp does AES > crypto and Kerberos using AES ECB mode makes a call to kCF/n2cp for > every 16byte block. That was one of the reasons I modified Kerberos to > use AES CBC but that aside, the point I am getting to is that it would > be nice if the CF could determine the best provider automatically based > on algorithm, mode and amount of data to be processed. In the case of > Kerberos using AES ECB, the kCF would use the software provider instead > of the n2cp. > > Is someone thinking about this and is there an RFE open? > >
There are two kinds of approaches you can use here. If you're mostly worried about throughput, and don't care that much about the difference in latency, you can take the approach of always scheduling to hardware, but then going to software when the provider's queue is full. Of course, this assumes that the main cost of queuing the job is less than the cost of setting up the DMA, etc. required for hardware acceleration. (And that the expensive part happens *after* the job is queued.) In this situation, you'll use hardware offload for the first group of jobs until you fill the queue for the provider, then you go to software. I'm pretty sure this is the approach I took with the Deimos (SCA 1000) product on the Solaris 8 kcl library. (That code isn't open, and I no longer have access to it, so I can't check.) Another, and perhaps better approach, is to do the same as above, but also factor in the cost of scheduling and the cost of processing each algorithm. You can probably swag a rough estimate that "setup" for any given crypto job costs "n" cycles. Then you can also say that for each algorithm, the cost of processing in software is "x" cycles per byte. In this case, any time "n" > "m * len", you should unconditionally go to software. You should still use the queue failover mechanism above to allow multiple core cpus to participate in crypto as well as crypto engines though. (No point in having niagra cores sitting around idle waiting for hardware crypto to complete.) I've not looked at kCF in detail to see if this is how it schedules or not. -- Garrett