the big issue wasn't truncation; it was that i had put the wrong block
of code in an if/else condition.  current challenge is that the
efficient attention implementation doesn't provide for applying
dropout (random zeroing of some weights during training) where the
perceiver model applies it.  i fudged something in, untested.

commit 7628f3e4f32ac25b11774d939f2e16a20dd2a8fd (HEAD ->
memory-efficient-attention, xloem/memory-efficient-attention)
Author: xloem <[email protected]>
Date:   Thu Jan 27 12:38:13 2022 +0000

    wip efficient attention: organising separate parts to include
dropout and application

Reply via email to