Hi, I would like to index documents which contain term frequencies instead of the actual text. For example, instead of getting "The big wolf ate the big sheep" I would get "the|2 big|2 wolf|1 ate|1 sheep|1". An easy way to index this would be to convert the frequencies back into text, so into something like "the the big big wolf ate sheep", but it does not look that elegant since I would be expanding the text, just to have Lucene "compress" it again.
Any ideas? Or directions I should look into? I am considering: - Custom Analyzer (so I expand on while generating the TokenStream from the compressed text) Thanks in Advance, Stephen