Chris Sampson created NIFI-7145:
-----------------------------------
Summary: Chained SplitText processors unable to handle files in
some circumstances
Key: NIFI-7145
URL: https://issues.apache.org/jira/browse/NIFI-7145
Project: Apache NiFi
Issue Type: Bug
Affects Versions: 1.11.1
Environment: Docker Image (apache/nifi) running in Kubernetes (1.15)
Reporter: Chris Sampson
Attachments: Broken_SplitText.json, Broken_SplitText.xml, test.csv.tgz
With chained SplitText processors (NiFi 1.11.1 apache/nifi Docker image with
default nifi.properties, although configured to allow secure access in my
environment with encrypted flowfile/provenance/content repositories, don't know
whether that makes a difference): * ingest 40MB CSV file with 50k lines of data
(plus 1 header)
* SplitText - chunk the file into 10k segments (including header in each file)
* SplitText - break each row out into its own FlowFile
The 10k chunking works fine, but then the files sit in the queue between the
processors forever with the second SplitText sat showing it’s working but never
actually produces anything (can’t see anything in the logs, although haven’t
turned on debug logging to see whether that would provide anything more).
If I reduce the chunk size to 1k then the per-row split works fine - maybe some
sort of issue with SplitText and/or swapping of FlowFiles/content to the
repositories?
Example Flow/Template attached with file that breaks the flow (untar and copy
into /tmp). Second SplitText set to Concurrency=3 in the template, but fails
just the same when set to default Concurrency=1.
SplitRecord would be an alternative (which works fine when I try it), but I
can’t use that as we potentially lose data if the CSV is malformed (there are
more data fields in a row that defined headers - the extra fields are thrown
away by the Record processors, which I understand to be normal and that’s fine,
but unfortunately I later need to ValidateRecord for each of these rows to
check for this kind of invalidity).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)