markap14 commented on a change in pull request #5251:
URL: https://github.com/apache/nifi/pull/5251#discussion_r681196777



##########
File path: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/TailFile.java
##########
@@ -976,14 +1032,53 @@ private long readLines(final FileChannel reader, final 
ByteBuffer buffer, final
         }
     }
 
-    private void flushByteArrayOutputStream(final ByteArrayOutputStream baos, 
final OutputStream out, final Checksum checksum) throws IOException {
-        baos.writeTo(out);
+    private void flushByteArrayOutputStream(final ByteArrayOutputStream baos, 
final OutputStream out, final Checksum checksum, final boolean ignoreRegex) 
throws IOException {
         final byte[] baosBuffer = baos.toByteArray();
-        checksum.update(baosBuffer, 0, baos.size());
+        baos.reset();
+
+        // If the regular expression is being ignored, we need to flush 
anything that is buffered.
+        // This happens, for example, when a file has been rolled over. At 
that point, we want to flush whatever we have,
+        // even if the regex hasn't been matched.
+        if (ignoreRegex) {
+            flushLinesBuffer(out, checksum);
+        }
+
+        if (lineStartRegex == null) {
+            out.write(baosBuffer);
+
+            checksum.update(baosBuffer, 0, baosBuffer.length);
+            if (getLogger().isTraceEnabled()) {
+                getLogger().trace("Checksum updated to {}", 
checksum.getValue());
+            }
+
+            return;
+        }
+
+        final String bufferAsString = new String(baosBuffer, 
StandardCharsets.UTF_8);
+        final String[] lines = bufferAsString.split("\n");

Review comment:
       No - System.lineSeparator() is specific to the system that NiFi is 
running on. There is no reason to believe that the data would be populated with 
the same line separator. There are, generally, 2 possible line endings: \r\n 
and \n. If we use System.lineSeparator() and NiFi is run on Windows, and 
ingesting files written with \n, it will never detect a newline. On the other 
hand, if it splits based on \n, then whether the line ends with \r\n or \n, it 
will still be split.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to