I'm basically thinking of this:
I'd train a tokenizer around the expected data (small sentences and a
ton of logo code), replace the old tokenizer with the new, and then
finetune the model until the whole system had the same behavior.

I'd like to include more goals in the approach, but it's so satisfying
to come through the difficulty, that it might be fine with just that
one for now.

Reply via email to