[
https://issues.apache.org/jira/browse/MESOS-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cody Maloney updated MESOS-2814:
--------------------------------
Description:
In master there are currently three implementations of the function:
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/os/read.hpp#L42
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/os/read.hpp#L82
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/os/read.hpp#L42
All of them have fairly radically different implementations (One uses C read(),
one uses c++ ifstream, one uses c fopen)
The read() based one does an excess / unnecessary copy / buffer allocation (it
is going to read into one temporary buffer, then copy into the result string.
Would be more efficient to do a .reserve() on the result string, and then fill
the result buffer).
The ifstream/ifstreambuf_iterator ignores that you can have an error partially
through reading a file / doesn't find the error or propagate it up.
The fopen() variant reads one newline separated line at a time. This could
produce interesting / unexpected reading in the context of a binary file. It
also causes glibc to insert null bytes at the end of the buffer it reads
(excess computation). result isn't pre-allocated to be the right length,
meaning that most of the continually read lines will result in realloc() and a
lot of memory copies which will be inefficient on large files.
was:Currently stout os::read() has two radically different implementations
when you give it a {{std::string}} vs. a {{const char *}}. Ideally these have
one implementation that does things like intelligently size the buffer that it
writes into rather than re-allocating repeatedly with every time it lengthens
the string (resulting in copious copying).
> os::read should have one implementation
> ---------------------------------------
>
> Key: MESOS-2814
> URL: https://issues.apache.org/jira/browse/MESOS-2814
> Project: Mesos
> Issue Type: Improvement
> Components: stout
> Reporter: Cody Maloney
> Assignee: Isabel Jimenez
> Labels: mesosphere, tech-debt
>
> In master there are currently three implementations of the function:
>
> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/os/read.hpp#L42
> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/os/read.hpp#L82
> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/os/read.hpp#L42
> All of them have fairly radically different implementations (One uses C
> read(), one uses c++ ifstream, one uses c fopen)
> The read() based one does an excess / unnecessary copy / buffer allocation
> (it is going to read into one temporary buffer, then copy into the result
> string. Would be more efficient to do a .reserve() on the result string, and
> then fill the result buffer).
> The ifstream/ifstreambuf_iterator ignores that you can have an error
> partially through reading a file / doesn't find the error or propagate it up.
> The fopen() variant reads one newline separated line at a time. This could
> produce interesting / unexpected reading in the context of a binary file. It
> also causes glibc to insert null bytes at the end of the buffer it reads
> (excess computation). result isn't pre-allocated to be the right length,
> meaning that most of the continually read lines will result in realloc() and
> a lot of memory copies which will be inefficient on large files.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)