Hello, I'm looking to scale out my NLP pipeline across a Spark cluster and was thinking UIMA-AS may work as a solution. However, I'm not sure how this would work in practice because in UIMA-AS you basically start up your NLP pipeline as a service using a message broker. The client sends documents to the broker using the hostname:port of the server. So I'm not sure how you would do that in a Spark environment.
On my local machine, I start the broker on localhost:61616 and then I can run multiple pipelines in parallel. So, in a cluster, would I have to make each machine start its own broker? And how would you configure the clients to distribute the load? It seems like you would have to start multiple clients independently, each specifying a subset of documents, and then tell each one to send their load to a different server. So you would need the host:port of each service. Or is there a way that you can have some manager in between which handles the distribution for you? Ideally, I would want a single client to be able to make a request have the load get distributed automatically.
