[email protected] (Niels Möller) writes:
> Simon Josefsson <[email protected]> writes:
>
>> Actually, sleeping on this, I realized that we really want to export the
>> Salsa20 core primitive (this was what I actually needed), and that is
>> the primitive that should be implemented in assembler. I've fixed this
>> in the attached patch.
>>
>> The Salsa20 core is a hash function (not your typical hash function
>> though) described here:
>
> I guess it could be named salsa20_hash, then? (I think there was such a
> function in a previous version of the code).
The name of the hash is "Salsa20 core" but I think little effort has
gone into tightening up the documentation around the Salsa20 hash (for
example, there are no test vectors that I could find). salsa20_hash
works for me, but could be confusing as it isn't a normal hash.
>> If we implement that quickly in assembler, with a variable round
>> parameter, that will be sufficient to build fast C code around.
>
> Then you'd first write the hash output to memory, then read it back to
> xor it with the message. Since sals20 is pretty fast, I think you'll get
> a measurablle performance penalty compared to the currrent code which
> keeps the hash output in registers until it is xored to the message.
Right, good point.
> You really need to get just the hash output, without xoring it to
> anything?
Yes, although if necessary I could xor it to a zero buffer if there were
no other way... however I'll loose performance, and my application
(scrypt) would benefit from good performance.
> It would definitely be cleaner to have the hash function separately.
I agree.
>> +salsa20_core (uint32_t src[_SALSA20_INPUT_LENGTH],
>> + uint32_t dst[_SALSA20_INPUT_LENGTH],
>> + unsigned rounds)
> [...]
>> + for (i = 0;i < _SALSA20_INPUT_LENGTH;++i)
>> + {
>> + uint32_t t = x[i] + src[i];
>> + dst[i] = LE_SWAP32 (t);
>> + }
>> +}
>
> This makes for a very peculiar interface for a non-internal function. It
> would make more sense from an interface perspectivve to either not do
> these byte swaps, or have the output parameter be of type uint8_t *. Or
> do something like the union gcm_block in gcm.h (although that's also not
> pretty), if we want to be able to store the byte swapped value with a
> word-sized store.
Let's use uint8_t. The first sentence of the Salsa20 core webpage is:
The Salsa20 core is a function from 64-byte strings to 64-byte
strings: the Salsa20 core reads a 64-byte string x and produces a
64-byte string Salsa20(x).
So that is consistent with uint8_t.
> I don't remember precisely the background of the current implementation,
> but I think the point was to do as much as possible of the processing as
> word operations, including the byte swapping.
Yes that will be faster I suppose.
/Simon
_______________________________________________
nettle-bugs mailing list
[email protected]
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs