bakaid commented on a change in pull request #596: MINIFICPP-925 - Fix TailFile
hang on long lines
URL: https://github.com/apache/nifi-minifi-cpp/pull/596#discussion_r297761084
##########
File path: extensions/standard-processors/processors/LogAttribute.h
##########
@@ -90,26 +90,19 @@ class LogAttribute : public core::Processor {
class ReadCallback : public InputStreamCallback {
public:
ReadCallback(uint64_t size)
- : read_size_(0) {
- buffer_size_ = size;
- buffer_ = new uint8_t[buffer_size_];
- }
- ~ReadCallback() {
- if (buffer_)
- delete[] buffer_;
+ : buffer_(size) {
}
int64_t process(std::shared_ptr<io::BaseStream> stream) {
- int64_t ret = 0;
- ret = stream->read(buffer_, buffer_size_);
- if (!stream)
- read_size_ = stream->getSize();
- else
- read_size_ = buffer_size_;
- return ret;
+ if (buffer_.size() == 0U) {
+ return 0U;
+ }
+ int ret = stream->read(buffer_.data(), buffer_.size());
Review comment:
@phrocker To be honest, I am not sure what the contract really is in these
cases. I could not find relevant comments or documentation, so I am basing my
understanding on the usages I have seen and the implementation of
ContentRepositories, ResourceClaim and BaseStream.
What this implementation did before I changed it was the following: it
requested the size of the FlowFile and created a ReadCallback with a buffer
size equal to the FlowFile's size.
This ReadCallback is then passed to ProcessSession::read which gets a
BaseStream from the ContentRepository for the ResourceClaim owned by the
FlowFile. It then passes this BaseStream to the ReadCallback's process function
where we are currently.
The old implementation then read at most buffer_size_ (so the FlowFile's
size) bytes from this stream.
**It then checked whether the stream (which is a shared_ptr) was null, right
after it dereferenced it and called a function on it!**
If this code path could have ever been reached (which it couldn't have been,
because we would have gotten a SIGSEGV before that), it would have set the
read_size_ to the size of the stream, ignoring the return value of the read
operation on the stream.
Otherwise (so in every case), it just sets the read_size_ to buffer_size_
(again, ignoring the size returned by read), which is, in fact, the FlowFile's
size.
So this ReadCallback could only ever have worked, if we could indeed read
all data in one read operation from the stream provided by the
ContentRepository. And this is what I think the contract is, and this is how I
have seen it used throughout the code. (We assume that in this case read will
return all data, and not just partial data).
My implementation just cleans up this logic and adds an extra check to see
whether this contract is really held true.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services