I have a series of data files that have two sections: headers that are text
which I want to alter using regex patterns, and a payload following the
text portion that is encoded data.

I call a groovy script from an ExecuteScript processor, employing an
inputStream and outputStream callback to make my change to the header
content. Problem is, I seem to be mangling the nontextual payload after the
header operating on the entire flowfile stream.

I suspect what I need to do is somehow read only the data portion of each
flowfile into my stream, make my changes to that, and write that back out
to the flowfile stream without disturbing the rest of the flowfile. I don't
know how to do that. I'm hoping someone can help.

Here is my Groovy script thus far (business logic removed). It works as
desired on flowfiles that are all text, but does not work for flowfiles
that have text in the header portion and encoded data following that.

import java.util.regex.Pattern
import org.apache.commons.io.IOUtils
import java.nio.charset.StandardCharsets

flowFileList = session.get(1000)
if (!flowFileList.isEmpty()) {
     flowFileList.each { flowFile ->
          try {
               flowFile = session.write(flowFile, {inputStream,
outputStream ->
                    text = IOUtils.toString(inputStream,
StandardCharsets.UTF_8)

                    //regex pattern manipulations in the header text
content happen here


outputStream.write(text.getBytes(StandardCharsets.UTF_8))
                } as StreamCallback)
               session.transfer(flowFile, REL_SUCCESS)
          } catch (e) {
               // error logging here
               session.transfer(flowFile, REL_FAILURE)
          }
     }
}

I posted to dev rather than users because of the nature of the question. My
apologies if I should have done otherwise.

Reply via email to