I agree with Joe, it sounds like the flow file content is the plain-text extract of the PDF, and the ES processors are expecting JSON documents. You can use ReplaceText to wrap your content in a JSON object, with something like:
{ "content" : $1} (I didn't try that but the idea is just to put the original text into a field inside a JSON object) Alternatively as Joe mentioned, you may want to consider offering the user a choice of output format (Text, JSON, etc.) in your custom processor. Regards, Matt On Tue, Feb 14, 2017 at 8:27 AM, Joe Witt <joe.w...@gmail.com> wrote: > Hello > > Is the code available to provide feedback on? When you extract aspects > from the PDF where are you putting the extracted values? You prob want to > convert the PDF content into the extracted results in json. You could also > extract aspects of the PDF into flowfile attributes and then make json in > another processor but this won't scale as well for large documents/extracts. > > Thanks > Joe > > On Feb 14, 2017 7:46 AM, "shankhamajumdar" <shankha.majum...@lexmark.com> > wrote: > > Hi, > > I am working on a use case where I need to load a PDF document to > ElasticSearch. I have written a Custom NiFi processor using apache Tika > which is basically extracting the content of the PDF. The NiFi flow is > mentioned below. > > 1. NiFi GetFile processor is getting the PDF file from the source directory. > > 2. NiFi custom processor which is written using apache Tika is extracting > the PDF file. > > 2. Using NiFi PutElasticsearch processor to load the data in ElasticSearch. > But I am getting the below error. > > MapperParsingException[failed to parse]; nested: > NotXContentException[Compressor detection can only > be called on some xcontent bytes or compressed xcontent bytes]; > > Regards, > Shankha > > > > -- > View this message in context: http://apache-nifi-developer- > list.39713.n7.nabble.com/NiFi-PutElasticsearch-Processor-tp14733.html > Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.