On 12/01/2017 10:20, Joern Kottmann wrote:
The POSTagger interface just grew over time and I am not sure it is
actually that great. Today there are different ways of returning
probabilities.

- tag and probs (this is POSTaggerME only and not in the interface)
- tokKSequences, which returns multiple possible Sequence objects

Wouldn't it make sense to unify that? One new method which takes a sentence
and returns the best Sequence.

What do you think about a thread safe wrapper object for the POS Tagger? If
you want it thread safe you instantiate that one, internally it could use
ThreadLocal to switch between multiple POSTaggerME instances. Since
POSTagger (the interface) can be thread safe as it is now this seems to be
a rather simple change.

I agree, that's even better.

I'd be happy to do that. What's the procedure nowadays? Create a pull request on GH, which is then reviewed?

--Thilo


With your proposed solution a user would have to write
tagger.getThreadLocal().tag(...)
instead of
tagger.tag(...)

Jörn


On Thu, Jan 12, 2017 at 9:48 AM, Thilo Goetz <[email protected]> wrote:

On 11/01/2017 22:51, Joern Kottmann wrote:

On Wed, 2017-01-11 at 11:05 +0100, Thilo Goetz wrote:

in a recent project, I was using SentenceDetectorME, TokenizerME and
POSTaggerME. It turns out that none of those is thread safe. This is
because the classification probabilities for the last tag() call
(for
example) are stored in a member variable and can be retrieved by a
separate API call.

The POSTagger already has the Sequence object to return the result
with probabilties. If we would introduce a new method we can probably
just deprecate the method to retrieve the probs.

Should be a minor change to have an interface that can be thread safe.

[...]
I don't want to muddy the waters, but I had another idea: we could also
add a getThreadLocal() method to the tools we want. You would create a
POSTaggerME (for example) like always, and if you needed a per thread
version, you could then call getThreadLocal(), which would give you another
POSTaggerME with the same model, per thread. The advantage as I see it is
that the API extension would be conservative (just one method added), and
getting the probabilities would continue to work as before because you have
one instance per thread.

Does that make sense? I'm not sure I'm explaining this in the best
possible manner...

--Thilo



Reply via email to