[
https://issues.apache.org/jira/browse/NIFI-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15109592#comment-15109592
]
ASF GitHub Bot commented on NIFI-1118:
--------------------------------------
Github user markap14 commented on a diff in the pull request:
https://github.com/apache/nifi/pull/135#discussion_r50328396
--- Diff:
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/SplitText.java
---
@@ -198,23 +208,53 @@ private long countBytesToSplitPoint(final InputStream
in, final OutputStream out
return includeLineDelimiter ? bytesRead : bytesRead - 1;
}
- // keep track of what the last byte was that we read so that
we can detect \r followed by some other
+ // keep track of what the last byte was that we read so that
we can
+ // detect \r followed by some other
// character.
lastByte = nextByte;
}
}
- private SplitInfo countBytesToSplitPoint(final InputStream in, final
int numLines, final boolean keepAllNewLines) throws IOException {
+ private SplitInfo readHeader(final int numHeaderLines,
+ final String headerMarker, final
InputStream in,
+ final OutputStream out, final boolean
keepAllNewLines)
+ throws IOException {
SplitInfo info = new SplitInfo();
-
- while (info.lengthLines < numLines) {
- final long bytesTillNext = countBytesToSplitPoint(in, null,
keepAllNewLines || (info.lengthLines != numLines - 1));
- if (bytesTillNext <= 0L) {
- break;
+ boolean isHeaderLine = true;
+
+ // Read numHeaderLines from file, if specificed; a non-zero value
takes precedence
+ // over headerMarker character string
+ if (numHeaderLines > 0) {
+ for (int i = 0; i < numHeaderLines; i++) {
+ int bytesRead = readLine(in, out, keepAllNewLines);
+ if (bytesRead == 0) {
+ break;
+ }
+ info.lengthBytes += bytesRead;
+ info.lengthLines++;
+ }
+ // Else, keep reading all lines that begin with headerMarker
character string
+ } else if (headerMarker != null) {
+ while (true) {
+ in.mark(0);
--- End diff --
This call to in.mark(0) means that when in.reset() is called, we will reset
back 0 bytes, essentially making the mark/reset do nothing. Not sure that I am
following the logic here with how the mark/reset are being used.
> Enable SplitText processor to limit line length and filter header lines
> -----------------------------------------------------------------------
>
> Key: NIFI-1118
> URL: https://issues.apache.org/jira/browse/NIFI-1118
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Reporter: Mark Bean
> Assignee: Joe Skora
> Fix For: 0.5.0
>
>
> Include the following functionality to the SplitText processor:
> 1) Maximum size limit of the split file(s)
> A new split file will be created if the next line to be added to the current
> split file exceeds a user-defined maximum file size
> 2) Header line marker
> User-defined character(s) can be used to identify the header line(s) of the
> data file rather than a predetermined number of lines
> These changes are additions, not a replacement of any property or behavior.
> In the case of header line marker, the existing property "Header Line Count"
> must be zero for the new property and behavior to be used.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)