Hello, I am a master student currently working on a search engine project on BM25similarity. My question is about computing the length of vocabulary size of a single document. I have looked through the code base but has not found anything useful for that specific application. I am wondering if there is a way to compute specifically the length of the set of distinct terms for a single document? Please let me know if you can help me with this. Many thanks.
Michael