Re: Thread-safe versions of some of the tools

Joern Kottmann Thu, 12 Jan 2017 04:21:05 -0800

Exactly, we do this for every code change we have. Commit rights are
usually only used to merge a PR.


1. Fork our repository on github
2. Do you work on a new branch
3. Send us a PR

And we need a jira as well. Project is using checkstyle and is enforcing
this with compile errors, so don't be confused if your build fails for
style violation.

Jörn

On Thu, Jan 12, 2017 at 1:07 PM, Thilo Goetz <[email protected]> wrote:

> On 12/01/2017 10:20, Joern Kottmann wrote:
>
>> The POSTagger interface just grew over time and I am not sure it is
>> actually that great. Today there are different ways of returning
>> probabilities.
>>
>> - tag and probs (this is POSTaggerME only and not in the interface)
>> - tokKSequences, which returns multiple possible Sequence objects
>>
>> Wouldn't it make sense to unify that? One new method which takes a
>> sentence
>> and returns the best Sequence.
>>
>> What do you think about a thread safe wrapper object for the POS Tagger?
>> If
>> you want it thread safe you instantiate that one, internally it could use
>> ThreadLocal to switch between multiple POSTaggerME instances. Since
>> POSTagger (the interface) can be thread safe as it is now this seems to be
>> a rather simple change.
>>
>
> I agree, that's even better.
>
> I'd be happy to do that. What's the procedure nowadays? Create a pull
> request on GH, which is then reviewed?
>
> --Thilo
>
>
>
>> With your proposed solution a user would have to write
>> tagger.getThreadLocal().tag(...)
>> instead of
>> tagger.tag(...)
>>
>> Jörn
>>
>>
>> On Thu, Jan 12, 2017 at 9:48 AM, Thilo Goetz <[email protected]> wrote:
>>
>> On 11/01/2017 22:51, Joern Kottmann wrote:
>>>
>>> On Wed, 2017-01-11 at 11:05 +0100, Thilo Goetz wrote:
>>>>
>>>> in a recent project, I was using SentenceDetectorME, TokenizerME and
>>>>> POSTaggerME. It turns out that none of those is thread safe. This is
>>>>> because the classification probabilities for the last tag() call
>>>>> (for
>>>>> example) are stored in a member variable and can be retrieved by a
>>>>> separate API call.
>>>>>
>>>>> The POSTagger already has the Sequence object to return the result
>>>> with probabilties. If we would introduce a new method we can probably
>>>> just deprecate the method to retrieve the probs.
>>>>
>>>> Should be a minor change to have an interface that can be thread safe.
>>>>
>>>> [...]
>>>>
>>> I don't want to muddy the waters, but I had another idea: we could also
>>> add a getThreadLocal() method to the tools we want. You would create a
>>> POSTaggerME (for example) like always, and if you needed a per thread
>>> version, you could then call getThreadLocal(), which would give you
>>> another
>>> POSTaggerME with the same model, per thread. The advantage as I see it is
>>> that the API extension would be conservative (just one method added), and
>>> getting the probabilities would continue to work as before because you
>>> have
>>> one instance per thread.
>>>
>>> Does that make sense? I'm not sure I'm explaining this in the best
>>> possible manner...
>>>
>>> --Thilo
>>>
>>>
>>>
>

Re: Thread-safe versions of some of the tools

Reply via email to