Re: Question with ExtractText Processor

Mark Payne Wed, 12 Jul 2017 13:04:49 -0700

Atish,

I think there may be a limit on the number of extracted columns, but if you 
exceeded that limit,
then the Processor would be invalid. If you are trying to use a regex that has 
34 .* segments, then
the performance is likely to be awful. Any time you have a .* in a regex it's 
quite expensive. Doing that
34 times can be incredibly expensive.

Is it possible for you to upgrade to a newer version of NiFi? With the newest 
version (1.3) there was the
introduction of a handful of Record-oriented Processors. These should make flow 
design dramatically
easier and should result in far, far better performance.

So instead of using a SplitContext -> ExtractText with regexes -> ReplaceText 
you could just simply use
ConvertRecord (with a CSV Reader and a JSON Writer), and it will keep all of 
the records within a single FlowFile.
No need to fuss with regular expressions or replacing text.

Thanks
-Mark

> On Jul 12, 2017, at 1:24 PM, Atish Ray <[email protected]> wrote:
> 
> Thanks!!! Regex is working for me with smaller number of column. Another
> problem I am facing with ExtractText processor. My pipe delimited file 
> having 34 fields. I need to convert all 34 fields and convert them into
> json. My file size is around 30MB. So I am converting from CSV to JSON using
> "SplitContent">ExtractText>ReplaceText. Queue is stuck before ExtractText.
> Do we have any limitation on number of extracted column? 
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-nifi-developer-list.39713.n7.nabble.com/Question-with-ExtractText-Processor-tp16405p16412.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: Question with ExtractText Processor

Reply via email to