Hello everybody,

I would like to explain more in details how proposed solution works and how it 
could be integrated into existing system.

Algorithm: target word (to find synonyms for) is masked and passed to Bert 
model. Afterwards, Bert results are filtered with FastText. Minimal scores from 
Bert and FastText are configurable, and weights for results of this 2 models 
are configurable as well. Therefore, weights could be trained later (some real 
data is required though).
Moreover, this pipeline could be improved with adding additional models or 
filters, e.g. for some specific domain we can replace models or fit them with 
domain-specific data.

Application: right now there are 2 ways to use this pipeline, "static" and 
"dynamic" approaches.
With "static" approach for Nlpcraft model and example sentences potential 
synonyms are generated to manually expand model.
"Dynamic" approach is to pass sentence to model, which return potential 
synonyms for the word. You can look at it as one more enricher, which spawns 
new tokens. Then, if user want to use it in their model, they can write a macro 
rule, i.e. "I want to have exactly word(s) A or any word, that model thinks is 
synonym to word A"

Thanks,
Gleb.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, May 8, 2020 8:56 AM, Sergey Makov <[email protected]> wrote:

> Hi Gleb,
>
> thank you very much for your contribution,
> it looks really promising!
>
> Regards,
> Sergey
>
> On Fri, May 8, 2020 at 6:52 PM Ifropc [email protected] wrote:
>
> > Hello everybody!
> > I created a pull request for implementing auto-enriching user models with 
> > synonyms (NLPCRAFT-11), please see details in PR on Github.
> > Comments are appreciated, if any.
> > Thanks,
> > Gleb.
> > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > On Thursday, May 7, 2020 11:19 PM, GitBox [email protected] wrote:
> > > Ifropc opened a new pull request #1:
> > > URL: https://github.com/apache/incubator-nlpcraft/pull/1
> > > This pull request should resolve NLPCRAFT-11: auto-enrich user models 
> > > with synonyms
> > > Proposed approach uses Bert (RoBerta) model to generate synonyms for 
> > > given context, masking target word. Afterwards, output is filtered with 
> > > FastTest for specified context.
> > > This feature could also be integrated with NLPCRAFT-41
> > >
> > > This is an automated message from the Apache Git Service.
> > > To respond to the message, please log on to GitHub and use the
> > > URL above to go to the specific comment.
> > > For queries about this service, please contact Infrastructure at:
> > > [email protected]


Reply via email to