Hi Christian,

Of course there is interest in a RabbitMQ output connector.  Please let us
know any way in which we can help.

>>>>>>
-          Data format. How should the document look when it is handed over
to RabbitMQ. Is there a standard format than should be used? OData? By
using a standard format it would be easier for other systems to consume
data from Manifold as well. For my particular upcoming project I will
probably send the content in the JSON format used by Logstash.
<<<<<<

Basically you have free reign in your output connector to decide what data
format makes sense.  The output you will get from ManifoldCF will be a
RepositoryDocument java object, which consists of one (1) binary chunk of
data, in stream form, and (2) multiple pieces of metadata, which are
presumed to be bounded in size, and (3) some well-known metadata, such as
the content-type of the binary data you are dealing with, if known.  You
should *not* in general presume you can fit the entire binary file in
memory.

>>>>>>
-          Error handling. Who is responsible should a document fail during
content processing? A fire-and-forget approach from Manifold with leaving
the doc in the message queue is the simplest. Logstash is responsible for
handling the error and refeed/fix the error until success. If there is an
API for telling Manifold about failing docs Logstash can write status
messages to a separate queue.  Reading that queue and updating doc status
in Manifold should be easy.

<<<<<<

Output connectors have the following error handling:

(1) ServiceInterruption exception, which indicates that the document cannot
be accepted at the time, but may be accepted if tried again later (and you
can signal how long to wait before retrying);
(2) "Document accepted" return code, indicating that the document was
successfully received
(3) "Document rejected" return code, indicating that the document has been
determined to be unsuitable, and will thus NOT be retried;
(4) ManifoldCFException exception, which indicates that something fatal and
unexpected happened.  (Code ManifoldCFException.INTERRUPTED should be
thrown only on thread interruption during shutdown.)

Hope this helps.

Karl




On Thu, Nov 21, 2013 at 6:05 AM, Christian M. Rieck <
[email protected]> wrote:

> Hello,
>
> The need for a content processing pipeline has led me to implement an
> output connector for RabbitMQ. In my current setup Logstash reads from the
> queue, does some simple processing and feeds the document to Elasticsearch.
> Scaling is pretty straight forward by adding more Logstash processes. Of
> all the content processing frameworks I've looked at lately Logstash seems
> like the only one that has the most traction at the moment. Support for
> delete operations (in Logstash) is still missing, but is being addressed.
>
> There are two issues that I've identified that I'd like the community to
> comment on: (I write RabbitMQ and Logstash explicitly, but the argument
> holds for any queue/processing framework)
>
> -          Data format. How should the document look when it is handed
> over to RabbitMQ. Is there a standard format than should be used? OData? By
> using a standard format it would be easier for other systems to consume
> data from Manifold as well. For my particular upcoming project I will
> probably send the content in the JSON format used by Logstash.
>
> -          Error handling. Who is responsible should a document fail
> during content processing? A fire-and-forget approach from Manifold with
> leaving the doc in the message queue is the simplest. Logstash is
> responsible for handling the error and refeed/fix the error until success.
> If there is an API for telling Manifold about failing docs Logstash can
> write status messages to a separate queue.  Reading that queue and updating
> doc status in Manifold should be easy.
>
>
>
>
>
> Any thoughts on this, or interest in general for a RabbitMQ output
> connector?
>
>
>
>
>
> Regards,
>
> Christian.
>

Reply via email to