20: eina_tmpstr: add eina_tmpstr_strftime

Tom Hacohen Mon, 28 Sep 2015 01:55:40 -0700

On 24/09/15 18:53, Cedric BAIL wrote:
> On Thu, Sep 24, 2015 at 10:20 AM, Tom Hacohen <[email protected]> wrote:
>> On 23/09/15 19:56, Cedric BAIL wrote:
>>> On Wed, Sep 23, 2015 at 1:18 AM, Tom Hacohen <[email protected]> wrote:
>>>> On 22/09/15 18:31, Cedric BAIL wrote:
>>>>> Le 22 sept. 2015 09:40, "Tom Hacohen" <[email protected]> a écrit :
>>>>>>
>>>>>> On 22/09/15 17:32, Cedric BAIL wrote:
>>>>>>> Le 22 sept. 2015 02:30, "Daniel Kolesa" <[email protected]> a écrit :
>>>>>>>>
>>>>>>>> On Mon, Sep 21, 2015 at 11:24 PM, Shilpa Singh <
>>>>> [email protected]>
>>>>>>> wrote:
>>>>>>>>> cedric pushed a commit to branch master.
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>> http://git.enlightenment.org/core/efl.git/commit/?id=abaf29cb768375957c9ee0b64d36034c21c618ea
>>>>>>>>>
>>>>>>>>> commit abaf29cb768375957c9ee0b64d36034c21c618ea
>>>>>>>>> Author: Shilpa Singh <[email protected]>
>>>>>>>>> Date:   Mon Sep 21 23:48:16 2015 +0200
>>>>>>>>>
>>>>>>>>>         eina_tmpstr: add eina_tmpstr_strftime
>>>>>>>>
>>>>>>>> This API seems awfully arbitrary and I don't really see the point.
>>>>>>>> Might as well be any other function that prints to strings - are you
>>>>>>>> gonna add these too? Sounds horrible to me.
>>>>>>>
>>>>>>> Is it better to have our code cluttered by static buffer and no overflow
>>>>>>> check ? Obviously not. Yes,I expect we refactor all our buffer print
>>>>> code
>>>>>>> as it is pretty bad right now and may even lead to security issue in
>>>>> some
>>>>>>> case.
>>>>>>
>>>>>> Just to chime in, I think this doomsday scenario is a bit exaggerated.
>>>>>> I think having code like:
>>>>>>
>>>>>> char buf[SOME_LEN];
>>>>>> strftime(...buf...);
>>>>>> return tmpstr_add(buf);
>>>>>>
>>>>>> Is a much cleaner solution than adding all of the string functions of
>>>>>> the world to tmpstr, strbuf, stringshare and etc.
>>>>>>
>>>>>> I really don't like code duplication, and I think that's where Daniel is
>>>>>> coming from.
>>>>>>
>>>>>> What do you think?
>>>>>
>>>>> It's a good example that won't always work and increase at best the stack
>>>>> size requirement for absolutely no good reason.
>>>>
>>>> Why wouldn't it always work?
>>>> Also if you enclose it in its own sub {}, every compiler will treat the
>>>> stack sanely, and even without it, I'd assume optimising compilers will.
>>>
>>> The previous implementation used in Elementary was not working with
>>> some local where it used way more charactere than expected. Work
>>> around solution is to be over generous on memory...
>>>
>>> But compiler have no way to optimize it at all. How could they know
>>> before jumping to a function call, that is going to generate data at
>>> runtime, how much that function is going to use to limit the size of
>>> the buffer on the stack they give to it. No compiler can optimize
>>> that. It's just impossible. Every time we put a 4K buffer on the stack
>>> all further function call will force that page allocation, that's how
>>> things are.
>>
>> As implied by my comment about the own sub {}, I was talking about
>> optimising the over-stack-usage.
>>
>> char buf[SOMELEN];
>> strftime(buf);
>> ret = tmpstr_add(buf);
>> // buf is not used anymore, stack can be cleared.
>>
>> That's what I meant. It's true though that compilers can't assume we
>> haven't kept the buf pointer and used it somewhere else (which we are
>> allowed to until the function ended). Though that's not what you were
>> talking about.
>
> That is another case that a compiler can not optimize. It has no clue
> that buf is not stored somewhere globally, or that ret doesn't point
> to somewhere in it. That's why stack size vary depending on the block
> you are in. Also stack never shrink, so once you get those 4K on it,
> they will stay there.


As I said, this case is unlikely, but the other case I mentioned does 
allow the compiler to optimise:

int foo(void)
{
    const char *bla;
    {
       char buf[LONGBUF];
       strftime(...);
       bla = eina_tmpstr_add;
    }
    // rest of code
    printf("%s\n", bla);
    return 0;
}

this can even be simplified into a macro that accepts a parameter saying 
which function to use to duplicate the string. strdup, eina_tmpstr_add, 
eina_stringshare_add or whatever.

>
>> However, looking at the tmpstr code, the solution there is no better. It
>> reallocs all the time and implements a whole mechanism there. Wouldn't
>> it be much better to create a eina_strftime_alloc (bad name) that
>> returns an allocated string of the right size and that can be used in
>> strbuf (strbuf_string_steal), tmpstr (add a string steal) and etc?
>> That's writing the function once, and having one API for all of eina,
>> instead of the proposed method which will add plenty and duplicate all
>> around.
>
> strftime doesn't tell you the size of the string you need. It only
> tells you if it failed to put the generated string inside the buffer
> you gave it. There is no way around doing a realloc loop with that
> API.
>
> As for writing the function once, it is the case right now. If I was
> to add that function to strbuf or stringshare, then I would obviously
> find a way to share the code. If not by just calling the tmpstr
> function. In general for Eina, we do not add a function until we have
> at least 2 users in our code base. So for now, I do not plan to add
> strftime to any other part of Eina, as we do not have the need for it.

You are adding eina_tmpstr_add() without any users in the code base... 
Anyhow, I think it would be better to add API in a common place, as we 
know it'll be used more, than to add a specific API and then duplicating 
it to a common place.

>
>>>>> I see no reason to have bad code like that just for fewer line of code.
>>>>
>>>> It's not a fewer lines of code, it's fewer API and a lot less lines of
>>>> code. Also, adding such API implies to developers that they should
>>>> expect such API, so you are just feeding the evil loop.
>>>
>>> And they should use it, as the way they usually manually do it lead to
>>> bug. This is actually reducing our code base and also reduce our
>>> number of actual bugs.
>>
>> Have a general purpose function that does that (as above), no manual
>> work. Still reduces the codebase, much cleaner.
>
> eina_strftime is I guess what you are trying to say here. Basically
> not making it a tmpstr and just use free to get rid of it. I guess
> that's an acceptable solution.

Yes, it's exactly what I'm saying. Let's revert this patch and work on 
an eina_strftime instead.

>
>>>> Btw, I was thinking about the whole tmpstr solution (which you know I am
>>>> not a fan off) last night, and I came up with two solutions I think are
>>>> better, and anyhow, are massively more efficient.
>>>> The first involves creating new API (but it's still less API than
>>>> tmpstr) for genlist and etc that accept a struct instead of a string.
>>>> This way we can add more info that will hint who should free it.
>>>> This is the more involved, clean solution, that we should maybe adopt
>>>> for EFL 2.0.
>>>
>>> So your solution is to pass a struct where there will be one byte
>>> dedicated to say if that string need to be freed ? I think that's far
>>> from elegant and fully error prone with no way for the compiler to
>>> help on that.
>>
>> Maybe we'll have more metadata in the future. The struct will be created
>> automatically with an helper function, so not error prone at all.
>
> I will have to see it, but I have yet to see what would all those
> metadata be. For the moment, I don't see how this could be an
> improvement over current code base.

I don't think eina_tmpstr is a good solution that will scale nicely, 
these are just alternative ideas.

>
>>>> The easy hacky solution, which is a replacement to tmpstr (which is also
>>>> a hack) and is much simpler and more efficient, is to use this scheme:
>>>> For static strings you just return them:
>>>> return "foo";
>>>> for dynamic you return this:
>>>> return eina_tomstr_add("bar");
>>>>
>>>> You can free both with eina_tomstr_free(buf);
>>>>
>>>> The implementation is where the magic is at, in memory, it looks like this:
>>>> "\x02bar", the first char being the STX (start of text) char. We just
>>>> use eina_tomstr_str_get(buf) whenever we want to use it. It just returns
>>>> "buf + 1" if the first char is STX. It's infinitely faster and more
>>>> memory efficient than tmpstr, I think also much cleaner. The only bad
>>>> thing is that you need to use the str function, but I think that's a
>>>> reasonable compromise.
>>>
>>> That would obviously lead to random bug. If you free a buf that has
>>> not been allocated by your tomstr, it will lead to an off by one
>>> access, if you are unlucky you will fall on a non allocated page. If
>>> you are really unlucky it will fall on an existing data that does have
>>> the expected value and call free on a static buffer. Also it won't be
>>> that much faster or slower as the current implementation walk a short
>>> list that should definitively be only a few elements long.
>>
>> How so? As I said, genlist gets "\x02bar", so genlist can safely check
>> the *first* (not -1) character of the string. It can then pass the
>> result of string_get() to where it needs actual text, but it keeps the
>> real reference for future processing. I don't see why or how we'll have
>> text that starts with "\0x02". If we want to make it even more rare, we
>> can use two bytes. Still much more efficient than tmpstr. Why would the
>> the current implementation be a short list? You want this to be used all
>> around with dynamic strings, this list will grow as usage will grow.
>
> Oh, so you never let the API user know that it is indeed a string and
> allways make it a hidden structure. That would make the code base
> quite hugly with all those str_get arround for little benefit. Also
> having the risk to have a confused API with random behavior sounds bad
> to me. If you want to hide information in a less problematic way, you
> can hide easily one bit in the pointer itself. As your API wouldn't
> autorize the use of the string directly anyway, having the least
> significant bit used to say if the pointer is to be freed or not will
> be clearly less error prone. Still it's hugly as hell and I think
> should be avoided in Efl code base (That's not the first time we have
> ruled against using that trick in Eina code base, and I am still
> against this kind of trick).

I think the proposed mechanism is much safer than encoding in the 
pointer. Encoding in the pointer will result in bad memory access and 
thus a crash. My suggestion can cause some issues with comparison, 
that's it. You can create mechanisms that make it completely error free. 
For example, you could save it internally (when you do) as a type that 
can't be cast to a string, and then you'd have to use the functions 
otherwise it wouldn't compile. This is just a clean way to maintain the 
string ABI but still making it safe internally.

>
> As for why it should stay a short list it is due to what tmpstr is
> used for. Code pattern for it is of the form :
>
> tmp = function_returning_tmpstr();
> function_call(tmp);
> tmpstr_del(tmp);
>
> So either we have a massive recursion (which is wrong and need to be
> fixed), or we are leaking tmpstr. tmpstr is not intended to be used
> for outside of a block and should not be in a structure for example.
> As I said, tmpstr alive at any point in time should be pretty low and
> walking that said list never be a performance issue.
>

True. If we only use it in genlist (and similar), this is the usage 
pattern and it won't be the end of the world. However, as I said, I 
proposed above a solution I find more clean and more general purpose. 
Can be used in many places and saved for the long term. This is a bit 
diverging from the purpose of this thread though (my bad). It's more 
important that we deal with the issue at hand before we freeze.




Conclusion: let's add eina_strftime and eina_tmpstr_manage_new and 
revert this before the freeze.

--
Tom.

------------------------------------------------------------------------------
_______________________________________________
enlightenment-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Re: [E-devel] [EGIT] [core/efl] master 17/20: eina_tmpstr: add eina_tmpstr_strftime

Reply via email to