Hello,

The need for a content processing pipeline has led me to implement an output 
connector for RabbitMQ. In my current setup Logstash reads from the queue, does 
some simple processing and feeds the document to Elasticsearch. Scaling is 
pretty straight forward by adding more Logstash processes. Of all the content 
processing frameworks I've looked at lately Logstash seems like the only one 
that has the most traction at the moment. Support for delete operations (in 
Logstash) is still missing, but is being addressed.

There are two issues that I've identified that I'd like the community to 
comment on: (I write RabbitMQ and Logstash explicitly, but the argument holds 
for any queue/processing framework)

-          Data format. How should the document look when it is handed over to 
RabbitMQ. Is there a standard format than should be used? OData? By using a 
standard format it would be easier for other systems to consume data from 
Manifold as well. For my particular upcoming project I will probably send the 
content in the JSON format used by Logstash.

-          Error handling. Who is responsible should a document fail during 
content processing? A fire-and-forget approach from Manifold with leaving the 
doc in the message queue is the simplest. Logstash is responsible for handling 
the error and refeed/fix the error until success. If there is an API for 
telling Manifold about failing docs Logstash can write status messages to a 
separate queue.  Reading that queue and updating doc status in Manifold should 
be easy.





Any thoughts on this, or interest in general for a RabbitMQ output connector?





Regards,

Christian.

Reply via email to