Hello everybody, I would like to explain more in details how proposed solution works and how it could be integrated into existing system.
Algorithm: target word (to find synonyms for) is masked and passed to Bert model. Afterwards, Bert results are filtered with FastText. Minimal scores from Bert and FastText are configurable, and weights for results of this 2 models are configurable as well. Therefore, weights could be trained later (some real data is required though). Moreover, this pipeline could be improved with adding additional models or filters, e.g. for some specific domain we can replace models or fit them with domain-specific data. Application: right now there are 2 ways to use this pipeline, "static" and "dynamic" approaches. With "static" approach for Nlpcraft model and example sentences potential synonyms are generated to manually expand model. "Dynamic" approach is to pass sentence to model, which return potential synonyms for the word. You can look at it as one more enricher, which spawns new tokens. Then, if user want to use it in their model, they can write a macro rule, i.e. "I want to have exactly word(s) A or any word, that model thinks is synonym to word A" Thanks, Gleb. ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Friday, May 8, 2020 8:56 AM, Sergey Makov <[email protected]> wrote: > Hi Gleb, > > thank you very much for your contribution, > it looks really promising! > > Regards, > Sergey > > On Fri, May 8, 2020 at 6:52 PM Ifropc [email protected] wrote: > > > Hello everybody! > > I created a pull request for implementing auto-enriching user models with > > synonyms (NLPCRAFT-11), please see details in PR on Github. > > Comments are appreciated, if any. > > Thanks, > > Gleb. > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > > On Thursday, May 7, 2020 11:19 PM, GitBox [email protected] wrote: > > > Ifropc opened a new pull request #1: > > > URL: https://github.com/apache/incubator-nlpcraft/pull/1 > > > This pull request should resolve NLPCRAFT-11: auto-enrich user models > > > with synonyms > > > Proposed approach uses Bert (RoBerta) model to generate synonyms for > > > given context, masking target word. Afterwards, output is filtered with > > > FastTest for specified context. > > > This feature could also be integrated with NLPCRAFT-41 > > > > > > This is an automated message from the Apache Git Service. > > > To respond to the message, please log on to GitHub and use the > > > URL above to go to the specific comment. > > > For queries about this service, please contact Infrastructure at: > > > [email protected]
