Hello, The need for a content processing pipeline has led me to implement an output connector for RabbitMQ. In my current setup Logstash reads from the queue, does some simple processing and feeds the document to Elasticsearch. Scaling is pretty straight forward by adding more Logstash processes. Of all the content processing frameworks I've looked at lately Logstash seems like the only one that has the most traction at the moment. Support for delete operations (in Logstash) is still missing, but is being addressed.
There are two issues that I've identified that I'd like the community to comment on: (I write RabbitMQ and Logstash explicitly, but the argument holds for any queue/processing framework) - Data format. How should the document look when it is handed over to RabbitMQ. Is there a standard format than should be used? OData? By using a standard format it would be easier for other systems to consume data from Manifold as well. For my particular upcoming project I will probably send the content in the JSON format used by Logstash. - Error handling. Who is responsible should a document fail during content processing? A fire-and-forget approach from Manifold with leaving the doc in the message queue is the simplest. Logstash is responsible for handling the error and refeed/fix the error until success. If there is an API for telling Manifold about failing docs Logstash can write status messages to a separate queue. Reading that queue and updating doc status in Manifold should be easy. Any thoughts on this, or interest in general for a RabbitMQ output connector? Regards, Christian.
