Hello,

I'm looking to scale out my NLP pipeline across a Spark cluster and was
thinking UIMA-AS may work as a solution. However, I'm not sure how this
would work in practice because in UIMA-AS you basically start up your NLP
pipeline as a service using a message broker. The client sends documents to
the broker using the hostname:port of the server. So I'm not sure how you
would do that in a Spark environment.

On my local machine, I start the broker on localhost:61616 and then I can
run multiple pipelines in parallel. So, in a cluster, would I have to make
each machine start its own broker? And how would you configure the clients
to distribute the load? It seems like you would have to start multiple
clients independently, each specifying a subset of documents, and then tell
each one to send their load to a different server. So you would need the
host:port of each service. Or is there a way that you can have some manager
in between which handles the distribution for you? Ideally, I would want a
single client to be able to make a request have the load get distributed
automatically.

Reply via email to