[Corpora-List] Re: Any literature about tensors-based corpora NLP research with actual examples (and homework ;-)) you would suggest? ...

Darren Cook via Corpora Mon, 24 Jul 2023 03:39:12 -0700

"mathematical purity" ... how you can use vector/tensor algebra with texts


I'd suggest using the search word "embeddings" instead of "tensor".

The concept is being used in other fields, even physics, but (stickingwith linguistics) if you've not looked into Word2Vec yet that is a goodplace to appreciate how human language and linear algebra come together.

It is normally introduced as a ready-made model of dim 300, trained onmillions of words. Like you I wanted to understand what it was actuallydoing, so a few years ago I did a presentation using just two dimensionsand a handful of words and sentences, then plotting the embeddings foundfor each word. You can add or remove a sentence at a time to see what itis learning from each.

You can see how each dimension is being given some meaning, even if theyare not the way a human linguist would have structured it.

It is also a good test bed for finding the limits, such as playingaround with ambiguous words and proper nouns, increasing the amount oftraining data without increasing dimension, etc.


Darren

P.S. The embedding layer is the first layer in transformers, the layerwhere tokens ("words") are turned into numbers, typically of dim 512 orhigher. But note that they are randomly generated, not initialized fromword2vec or similar. And any modification to their initial randomness isto please the layers above, not humans trying to peer inside the box.

P.P.S. I think you might also enjoyhttps://transformer-circuits.pub/2021/framework/index.html which isexploring how transformers work at a very low-level.

The gap between their minimalist models and something like ChatGPT ishuge, though, and reading their work isn't going to help you appreciatewhy ChatGPT says stupid things to you.


_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

[Corpora-List] Re: Any literature about tensors-based corpora NLP research with actual examples (and homework ;-)) you would suggest? ...

Reply via email to