Quoting Eddie Epstein <[email protected]>:

Parallelizing annotators to speed up processing as you describe sounds
attractive, except for all the ways the annotators can conflict with each
other and the difficulty in detecting/debugging.

Again, no-one would be forced to parallelize anything if they did not want to :) If the annotators are logically independent then there can be no conflicts (which should be pretty clear based on their purpose - another way of stating this is whether the order in which they are run matters or not), apart from the kind of additional synchronization requirements I already mentioned such as adding to the same list or incrementing a shared counter. What are the other ways that annotators can conflict with each other that you had in mind?

In any case, the fact that it might be possible to use a function incorrectly in general doesn't sound like a good reason to not support it. The same could be said of pretty much anything (again the existence of the low level API is one good example of this!)

Of course parallel CAS processing is currently only supported for remote
annotators, but that can could be fixed by making additional in memory CAS
copies [with the turbo charged CasCopier recently introduced] and creating
a new in memory CAS merger that worked like the merger in the
CasDeserializer.

Why go the route that involves additional complexity/overhead rather than solving via simplification? Part of my original motivation was to avoid having to copy entire CASes for this purpose, which gets very expensive even with the recent optimization. It's also much less flexible in terms of the kind of parallelization possible. I envisage a generic flow controller which would be configured with some dependency information rather than a fixed flow (for example annotator C depends on annotators A and B, D depends on C, E depends on A, etc). It would then optimally "compress" their execution using a prescribed # threads. I.e. any given annotator would get processed as soon as it's dependency annotators had completed.

Regards,
Nick

Reply via email to