[GitHub] [nifi-minifi-cpp] bakaid commented on a change in pull request #596: MINIFICPP-925 - Fix TailFile hang on long lines

GitBox Wed, 26 Jun 2019 09:37:06 -0700

bakaid commented on a change in pull request #596: MINIFICPP-925 - Fix TailFile 
hang on long lines
URL: https://github.com/apache/nifi-minifi-cpp/pull/596#discussion_r297761084


 ##########
 File path: extensions/standard-processors/processors/LogAttribute.h
 ##########
 @@ -90,26 +90,19 @@ class LogAttribute : public core::Processor {
   class ReadCallback : public InputStreamCallback {
    public:
     ReadCallback(uint64_t size)
-        : read_size_(0) {
-      buffer_size_ = size;
-      buffer_ = new uint8_t[buffer_size_];
-    }
-    ~ReadCallback() {
-      if (buffer_)
-        delete[] buffer_;
+        : buffer_(size)  {
     }
     int64_t process(std::shared_ptr<io::BaseStream> stream) {
-      int64_t ret = 0;
-      ret = stream->read(buffer_, buffer_size_);
-      if (!stream)
-        read_size_ = stream->getSize();
-      else
-        read_size_ = buffer_size_;
-      return ret;
+      if (buffer_.size() == 0U) {
+        return 0U;
+      }
+      int ret = stream->read(buffer_.data(), buffer_.size());
 
 Review comment:
   @phrocker To be honest, I am not sure what the contract really is in these 
cases. I could not find relevant comments or documentation, so I am basing my 
understanding on the usages I have seen and the implementation of 
ContentRepositories, ResourceClaim and BaseStream.
   What this implementation did before I changed it was the following: it 
requested the size of the FlowFile and created a ReadCallback with a buffer 
size equal to the FlowFile's size.
   This ReadCallback is then passed to ProcessSession::read which gets a 
BaseStream from the ContentRepository for the ResourceClaim owned by the 
FlowFile. It then passes this BaseStream to the ReadCallback's process function 
where we are currently.
   The old implementation then read at most buffer_size_ (so the FlowFile's 
size) bytes from this stream.
   **It then checked whether the stream (which is a shared_ptr) was null, right 
after it dereferenced it and called a function on it!**
   If this code path could have ever been reached (which it couldn't have been, 
because we would have gotten a SIGSEGV before that), it would have set the 
read_size_ to the size of the stream, ignoring the return value of the read 
operation on the stream.
   Otherwise (so in every case), it just sets the read_size_ to buffer_size_ 
(again, ignoring the size returned by read), which is, in fact, the FlowFile's 
size.
   So this ReadCallback could only ever have worked, if we could indeed read 
all data in one read operation from the stream provided by the 
ContentRepository. And this is what I think the contract is, and this is how I 
have seen it used throughout the code. (We assume that in this case read will 
return all data, and not just partial data).
   My implementation just cleans up this logic and adds an extra check to see 
whether this contract is really held true.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [nifi-minifi-cpp] bakaid commented on a change in pull request #596: MINIFICPP-925 - Fix TailFile hang on long lines

Reply via email to