So, the thing to do here is apparently to use a language adapter. These mutate 
embeddings intended for other models such that minimal training is needed.

If training ones own tokenizers, it would make sense to reduce the vocab size 
so there are fewer embeddings, but you could just use a tokenizer from any 
model trained on similar data, with a language adapter.

RWKV does long context as well and is starting to take off; in their chat 
somebody mentioned making a mobile app that uses it. No adapters yet.

I have downtime ATM as I can barely move, my limbs spasm when i try to stand up 
[or perform fine motor tasks to move forward on these] . It passes with time.

Still keeping the embeddings doing their thing on colab.

Excited to eventually fix that data bug.

Reply via email to