I peeked at the code and I still think it's not a bad idea to experiment with extracting a facade for construction and lookup of words. there may even be a middle ground between size and speed - if you assume zipfian distribution of words, the top common ones could be stored/ cached outside of the fst (even in an associative dictionary). This would require external frequency information during construction but this isn't something difficult.
D. On Thu, Feb 11, 2021 at 8:54 AM Dawid Weiss <[email protected]> wrote: > > I didn't mean for Peter to write both backends but perhaps, if he's > experimenting already anyway, make it possible to extract an interface > which could be substituted externally with different implementations. Makes > it easier to tinker with various options, even for us. > > D. > > On Thu, Feb 11, 2021 at 1:16 AM Robert Muir <[email protected]> wrote: > >> On Wed, Feb 10, 2021 at 3:05 PM Dawid Weiss <[email protected]> >> wrote: >> > Maybe the "backend" could be configurable somehow so that you could >> change the strategy depending on your needs?... I haven't looked at how >> FSTs are used but if can be hidden behind a facade then an alternative >> implementation could be provided depending on one's need? >> > >> > D. >> > >> >> I don't have any confidence that solr would default to the "smaller" >> option or fix how they manage different solr cores or thousands of >> threads or any of the analyzer issues. And who would maintain this >> separate hunspell backend? I don't think it is fair to Peter to have >> to cope with 2 implementations of hunspell, 1 is certainly enough... >> :). It's all apache license, at the end of the day if someone wants to >> step up, let 'em. otherwise let's get out of their way. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >>
