Configuration Management at Transformation Connectors

Rafa Haro Tue, 01 Jul 2014 10:05:04 -0700

Hi guys,

I'm trying to develop my first Transformation Connector. Before startingto code, I have tried to read first enough documentation and I have alsostudied the Tika extractor as transformation connector example.Currently, I'm just trying to implement an initial version of myconnector, starting with something simple to later complicate the thingsa little bit. The first problem I'm facing is the configurationmanagement, where I'm probably missing something. In my case, I need afixed configuration while creating an instance of the connector and aextended configuration per job. Let's say that the connectorconfiguration has to setup a service and the job configuration willdefine how the service should work for each job. With bothconfigurations, I need to create an object which is expensive toinstantiate. Here is where the doubts raise:

1. I would like to initialize the configuration object only once per jobexecution. Because the configuration is not supposed to be changedduring a job execution, I would like to be able to take theconfiguration parameters from ConfigParams and from Specificationobjects and create a unique instance of my configuration object.

2. The getPipelineDescription method is quite confusing for me. In theTika Extractor, this method is used to pack in a string theconfiguration of the Tika processor. Then this string is again unpackedin the addOrReplaceDocumentWithException method to read thedocumentation. My question is why?. As far as I understand, theconfiguration can't change during the job execution and according to thedocumentation "the contents of the document cannot be considered by thismethod, and that a different version string (defined inIRepositoryConnector) is used to describe the version of the actualdocument". So, if only configuration data can be used to create theoutput version string, probably this version string can be checked bythe system before starting the job and not produced and checked perdocument because basically all the documents are going to produce thesame exact output version string. Probably I'm missing something but,for example, looking at Tika Transformation connector seems to be prettyclear that there would be no difference between output version stringsfor all the documents because it is using only configuration data tocreate the string.

3.In the addOrReplaceDocumentWithException, why is thepipelineDescription passed by parameter instead of the connectorSpecification to ease the developer to access the configuration withoutmarshalling and unmarshalling it?

4. Is there a way to reuse a single configuration object per jobexecution? In the Output processor connector, I used to initialize mycustom stuff in the connect method (I'm not sure if this strategy isvalid anyway), but for the Transformation connectors I'm not even sureif this method is called.

Thanks a lot for your help beforehand. Please note that the questions ofcourse are not intended to be criticism. This mail is just a dump ofdoubts that probably will help me to better understand the workflows inmanifold

Configuration Management at Transformation Connectors

Reply via email to