Type conversion from uint8 to uint32 (or uint64) is technically free because 
registers have a size of 32-bit or 64-bit.

If anything it's less costly because at a low-level loading a 8-bit value 
requires movzb (mov to register and zero extend the byte) which has a slightly 
high latency than plain mov for uint32 and uint64.

The issue in your convolutions is memory bottleneck not compute.

Reply via email to