[ 
https://issues.apache.org/jira/browse/MESOS-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cody Maloney updated MESOS-2814:
--------------------------------
    Description: 
In master there are currently three implementations of the function:
 
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/os/read.hpp#L42
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/os/read.hpp#L82
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/os/read.hpp#L42

All of them have fairly radically different implementations (One uses C read(), 
one uses c++ ifstream, one uses c fopen)

The read() based one does an excess / unnecessary copy / buffer allocation (it 
is going to read into one temporary buffer, then copy into the result string. 
Would be more efficient to do a .reserve() on the result string, and then fill 
the result buffer).

The ifstream/ifstreambuf_iterator ignores that you can have an error partially 
through reading a file / doesn't find the error or propagate it up.

The fopen() variant reads one newline separated line at a time. This could 
produce interesting / unexpected reading in the context of a binary file. It 
also causes glibc to insert null bytes at the end of the buffer it reads 
(excess computation). result isn't pre-allocated to be the right length, 
meaning that most of the continually read lines will result in realloc() and a 
lot of memory copies which will be inefficient on large files.

  was:Currently stout os::read() has two radically different implementations 
when you give it a {{std::string}} vs. a {{const char *}}. Ideally these have 
one implementation that does things like intelligently size the buffer that it 
writes into rather than re-allocating repeatedly with every time it lengthens 
the string (resulting in copious copying). 


> os::read should have one implementation
> ---------------------------------------
>
>                 Key: MESOS-2814
>                 URL: https://issues.apache.org/jira/browse/MESOS-2814
>             Project: Mesos
>          Issue Type: Improvement
>          Components: stout
>            Reporter: Cody Maloney
>            Assignee: Isabel Jimenez
>              Labels: mesosphere, tech-debt
>
> In master there are currently three implementations of the function:
>  
> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/os/read.hpp#L42
> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/os/read.hpp#L82
> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/os/read.hpp#L42
> All of them have fairly radically different implementations (One uses C 
> read(), one uses c++ ifstream, one uses c fopen)
> The read() based one does an excess / unnecessary copy / buffer allocation 
> (it is going to read into one temporary buffer, then copy into the result 
> string. Would be more efficient to do a .reserve() on the result string, and 
> then fill the result buffer).
> The ifstream/ifstreambuf_iterator ignores that you can have an error 
> partially through reading a file / doesn't find the error or propagate it up.
> The fopen() variant reads one newline separated line at a time. This could 
> produce interesting / unexpected reading in the context of a binary file. It 
> also causes glibc to insert null bytes at the end of the buffer it reads 
> (excess computation). result isn't pre-allocated to be the right length, 
> meaning that most of the continually read lines will result in realloc() and 
> a lot of memory copies which will be inefficient on large files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to