Dear all, I would like to know if it's possible to get a list of ngrams with a hyphen inside, maybe during the tokenization process.
For exemple, I want to get these bigrams: - call-connected signal - clear-back signal - clear-forward signal Instead of two bigrams for each one: - call<>connected<>179 2608 527 connected<>signal<>189 320 9176 - clear<>back<>283 1115 733 back<>signal<>157 380 9176 - clear<>forward<>632 1115 877 forward<>signal<>493 1547 9176 Thanks a lot, Mercè