adapters do indeed support training embeddings. it is a parameter passed when enabling adapter training: https://github.com/adapter-hub/adapter-transformers/pull/245/files#diff-b31f98a320a05bd7744546d866cb04c4ac086ffae583745b969093c17d5cde6dR205
it looks like the trained embeddings are not then saved nor used unless additional functions are called to save and use them. another option would be using the vanilla tokenizers and simply replacing missing tokens with unique strings. this would keep ram usage down, but the training would not be quite as powerful since embeddings are not included, and it would make it hard to process data containing the missing strings. i'm thinking the vanilla tokenizer might be the way to go for me; to reduce the areas for delay and error. additionally the frankensteined tokenizers have begun decreasing their loss :S
