20: eina_tmpstr: add eina_tmpstr_strftime

Cedric BAIL Thu, 24 Sep 2015 10:54:32 -0700

On Thu, Sep 24, 2015 at 10:20 AM, Tom Hacohen <[email protected]> wrote:
> On 23/09/15 19:56, Cedric BAIL wrote:
>> On Wed, Sep 23, 2015 at 1:18 AM, Tom Hacohen <[email protected]> wrote:
>>> On 22/09/15 18:31, Cedric BAIL wrote:
>>>> Le 22 sept. 2015 09:40, "Tom Hacohen" <[email protected]> a écrit :
>>>>>
>>>>> On 22/09/15 17:32, Cedric BAIL wrote:
>>>>>> Le 22 sept. 2015 02:30, "Daniel Kolesa" <[email protected]> a écrit :
>>>>>>>
>>>>>>> On Mon, Sep 21, 2015 at 11:24 PM, Shilpa Singh <
>>>> [email protected]>
>>>>>> wrote:
>>>>>>>> cedric pushed a commit to branch master.
>>>>>>>>
>>>>>>>>
>>>>>>
>>>> http://git.enlightenment.org/core/efl.git/commit/?id=abaf29cb768375957c9ee0b64d36034c21c618ea
>>>>>>>>
>>>>>>>> commit abaf29cb768375957c9ee0b64d36034c21c618ea
>>>>>>>> Author: Shilpa Singh <[email protected]>
>>>>>>>> Date:   Mon Sep 21 23:48:16 2015 +0200
>>>>>>>>
>>>>>>>>        eina_tmpstr: add eina_tmpstr_strftime
>>>>>>>
>>>>>>> This API seems awfully arbitrary and I don't really see the point.
>>>>>>> Might as well be any other function that prints to strings - are you
>>>>>>> gonna add these too? Sounds horrible to me.
>>>>>>
>>>>>> Is it better to have our code cluttered by static buffer and no overflow
>>>>>> check ? Obviously not. Yes,I expect we refactor all our buffer print
>>>> code
>>>>>> as it is pretty bad right now and may even lead to security issue in
>>>> some
>>>>>> case.
>>>>>
>>>>> Just to chime in, I think this doomsday scenario is a bit exaggerated.
>>>>> I think having code like:
>>>>>
>>>>> char buf[SOME_LEN];
>>>>> strftime(...buf...);
>>>>> return tmpstr_add(buf);
>>>>>
>>>>> Is a much cleaner solution than adding all of the string functions of
>>>>> the world to tmpstr, strbuf, stringshare and etc.
>>>>>
>>>>> I really don't like code duplication, and I think that's where Daniel is
>>>>> coming from.
>>>>>
>>>>> What do you think?
>>>>
>>>> It's a good example that won't always work and increase at best the stack
>>>> size requirement for absolutely no good reason.
>>>
>>> Why wouldn't it always work?
>>> Also if you enclose it in its own sub {}, every compiler will treat the
>>> stack sanely, and even without it, I'd assume optimising compilers will.
>>
>> The previous implementation used in Elementary was not working with
>> some local where it used way more charactere than expected. Work
>> around solution is to be over generous on memory...
>>
>> But compiler have no way to optimize it at all. How could they know
>> before jumping to a function call, that is going to generate data at
>> runtime, how much that function is going to use to limit the size of
>> the buffer on the stack they give to it. No compiler can optimize
>> that. It's just impossible. Every time we put a 4K buffer on the stack
>> all further function call will force that page allocation, that's how
>> things are.
>
> As implied by my comment about the own sub {}, I was talking about
> optimising the over-stack-usage.
>
> char buf[SOMELEN];
> strftime(buf);
> ret = tmpstr_add(buf);
> // buf is not used anymore, stack can be cleared.
>
> That's what I meant. It's true though that compilers can't assume we
> haven't kept the buf pointer and used it somewhere else (which we are
> allowed to until the function ended). Though that's not what you were
> talking about.


That is another case that a compiler can not optimize. It has no clue
that buf is not stored somewhere globally, or that ret doesn't point
to somewhere in it. That's why stack size vary depending on the block
you are in. Also stack never shrink, so once you get those 4K on it,
they will stay there.

> However, looking at the tmpstr code, the solution there is no better. It
> reallocs all the time and implements a whole mechanism there. Wouldn't
> it be much better to create a eina_strftime_alloc (bad name) that
> returns an allocated string of the right size and that can be used in
> strbuf (strbuf_string_steal), tmpstr (add a string steal) and etc?
> That's writing the function once, and having one API for all of eina,
> instead of the proposed method which will add plenty and duplicate all
> around.

strftime doesn't tell you the size of the string you need. It only
tells you if it failed to put the generated string inside the buffer
you gave it. There is no way around doing a realloc loop with that
API.

As for writing the function once, it is the case right now. If I was
to add that function to strbuf or stringshare, then I would obviously
find a way to share the code. If not by just calling the tmpstr
function. In general for Eina, we do not add a function until we have
at least 2 users in our code base. So for now, I do not plan to add
strftime to any other part of Eina, as we do not have the need for it.

>>>> I see no reason to have bad code like that just for fewer line of code.
>>>
>>> It's not a fewer lines of code, it's fewer API and a lot less lines of
>>> code. Also, adding such API implies to developers that they should
>>> expect such API, so you are just feeding the evil loop.
>>
>> And they should use it, as the way they usually manually do it lead to
>> bug. This is actually reducing our code base and also reduce our
>> number of actual bugs.
>
> Have a general purpose function that does that (as above), no manual
> work. Still reduces the codebase, much cleaner.

eina_strftime is I guess what you are trying to say here. Basically
not making it a tmpstr and just use free to get rid of it. I guess
that's an acceptable solution.

>>> Btw, I was thinking about the whole tmpstr solution (which you know I am
>>> not a fan off) last night, and I came up with two solutions I think are
>>> better, and anyhow, are massively more efficient.
>>> The first involves creating new API (but it's still less API than
>>> tmpstr) for genlist and etc that accept a struct instead of a string.
>>> This way we can add more info that will hint who should free it.
>>> This is the more involved, clean solution, that we should maybe adopt
>>> for EFL 2.0.
>>
>> So your solution is to pass a struct where there will be one byte
>> dedicated to say if that string need to be freed ? I think that's far
>> from elegant and fully error prone with no way for the compiler to
>> help on that.
>
> Maybe we'll have more metadata in the future. The struct will be created
> automatically with an helper function, so not error prone at all.

I will have to see it, but I have yet to see what would all those
metadata be. For the moment, I don't see how this could be an
improvement over current code base.

>>> The easy hacky solution, which is a replacement to tmpstr (which is also
>>> a hack) and is much simpler and more efficient, is to use this scheme:
>>> For static strings you just return them:
>>> return "foo";
>>> for dynamic you return this:
>>> return eina_tomstr_add("bar");
>>>
>>> You can free both with eina_tomstr_free(buf);
>>>
>>> The implementation is where the magic is at, in memory, it looks like this:
>>> "\x02bar", the first char being the STX (start of text) char. We just
>>> use eina_tomstr_str_get(buf) whenever we want to use it. It just returns
>>> "buf + 1" if the first char is STX. It's infinitely faster and more
>>> memory efficient than tmpstr, I think also much cleaner. The only bad
>>> thing is that you need to use the str function, but I think that's a
>>> reasonable compromise.
>>
>> That would obviously lead to random bug. If you free a buf that has
>> not been allocated by your tomstr, it will lead to an off by one
>> access, if you are unlucky you will fall on a non allocated page. If
>> you are really unlucky it will fall on an existing data that does have
>> the expected value and call free on a static buffer. Also it won't be
>> that much faster or slower as the current implementation walk a short
>> list that should definitively be only a few elements long.
>
> How so? As I said, genlist gets "\x02bar", so genlist can safely check
> the *first* (not -1) character of the string. It can then pass the
> result of string_get() to where it needs actual text, but it keeps the
> real reference for future processing. I don't see why or how we'll have
> text that starts with "\0x02". If we want to make it even more rare, we
> can use two bytes. Still much more efficient than tmpstr. Why would the
> the current implementation be a short list? You want this to be used all
> around with dynamic strings, this list will grow as usage will grow.

Oh, so you never let the API user know that it is indeed a string and
allways make it a hidden structure. That would make the code base
quite hugly with all those str_get arround for little benefit. Also
having the risk to have a confused API with random behavior sounds bad
to me. If you want to hide information in a less problematic way, you
can hide easily one bit in the pointer itself. As your API wouldn't
autorize the use of the string directly anyway, having the least
significant bit used to say if the pointer is to be freed or not will
be clearly less error prone. Still it's hugly as hell and I think
should be avoided in Efl code base (That's not the first time we have
ruled against using that trick in Eina code base, and I am still
against this kind of trick).

As for why it should stay a short list it is due to what tmpstr is
used for. Code pattern for it is of the form :

tmp = function_returning_tmpstr();
function_call(tmp);
tmpstr_del(tmp);

So either we have a massive recursion (which is wrong and need to be
fixed), or we are leaking tmpstr. tmpstr is not intended to be used
for outside of a block and should not be in a structure for example.
As I said, tmpstr alive at any point in time should be pretty low and
walking that said list never be a performance issue.
-- 
Cedric BAIL

------------------------------------------------------------------------------
_______________________________________________
enlightenment-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Re: [E-devel] [EGIT] [core/efl] master 17/20: eina_tmpstr: add eina_tmpstr_strftime

Reply via email to