Chunker - proposal to change API (break compatibility)

William Colen Thu, 10 Nov 2016 01:35:16 -0800

Hi,

Today the Chunker sequence is the sentences pos tags.


Although we use both the tokens and tags in the context generator, in the
current API we ca not use the token in the sequence validator, because we
do not have access to it.

In Portuguese, I know there will never be some combinations of word + tag
in a specific kind of phrase. Today I can not set a rule with this filter
to the sequence validator.

I know maybe it is better to train the model so it will learn, but the hack
of adding this rule to the sequence validator is helpful.

Do you think we can change it for the release 1.7.0? I already tried this
change in a local branch for a personal project and it works (although it
was OpenNLP 1.5.3).

This would break API backward compatibility, but the exiting models would
not be affected.

Thank you
William

Chunker - proposal to change API (break compatibility)

Reply via email to