On 04/04/13 13:46, Carsten Haitzler (The Rasterman) wrote:
> On Thu, 04 Apr 2013 11:49:59 +0100 Tom Hacohen <[email protected]> said:
>
>> On 04/04/13 11:43, Carsten Haitzler (The Rasterman) wrote:
>>> On Thu, 04 Apr 2013 08:39:24 +0100 Tom Hacohen <[email protected]>
>>> said:
>>>
>>>> On 04/04/13 00:52, Carsten Haitzler (The Rasterman) wrote:
>>>>> On Wed, 03 Apr 2013 17:26:42 +0100 Tom Hacohen <[email protected]>
>>>>> said:
>>>>>
>>>>>> On 28/03/13 10:49, Carsten Haitzler (The Rasterman) wrote:
>>>>>>> On Thu, 28 Mar 2013 09:56:40 +0000 Michael Blumenkrantz
>>>>>>> <[email protected]> said:
>>>>>>>
>>>>>>> thats cool. i just had to be grumpy about not having a bug report that
>>>>>>> told me what to look at instantly. i have found another bug. single
>>>>>>> letter words dont find word end markers.
>>>>>>
>>>>>> I just checked it, and it works for me:
>>>>>> #include <stdlib.h>
>>>>>> #include <wchar.h>
>>>>>> #include <stdio.h>
>>>>>> #include <wordbreak.h>
>>>>>>
>>>>>> int main()
>>>>>> {
>>>>>>          {
>>>>>>             const char *lang = "";
>>>>>>             wchar_t *text = L"This is a test";
>>>>>>             size_t len = wcslen(text);
>>>>>>             char *breaks = malloc(len);
>>>>>>             size_t i;
>>>>>>
>>>>>>             printf("%ls\n", text);
>>>>>>
>>>>>>             set_wordbreaks_utf32((const utf32_t *) text, len, lang,
>>>>>> breaks); for (i = 0 ; i < len ; i++)
>>>>>>                printf("%d", (int) breaks[i]);
>>>>>>             printf("\n");
>>>>>>          }
>>>>>>        return 0;
>>>>>> }
>>>>>>
>>>>>> The output is:
>>>>>> This is a test
>>>>>> 11100100001110
>>>>>>
>>>>>> 1s meaning no break, 0s meaning break here. It does break correctly
>>>>>> around the "a". Could you elaborate more on the bug you were seeing?
>>>>>
>>>>> no NON-breaks around "a". you can't tell that there is a word there at
>>>>> all. it may as well be "   " (all spaces). :)
>>>>>
>>>>>> Cheers,
>>>>>> Tom.
>>>>>>
>>>>>
>>>>>
>>>>
>>>> Yeah, well, you know it using other means. Unfortunately it's beyond the
>>>> scope of the word breaking algorithm... There are no word breaks there,
>>>> thus the algorithm produces none. You probably need to just skip whites
>>>> in your code, not only rely on the wordbreak data when "merging" the
>>>> whites.
>>>
>>> and that is a problem as the word next/prev stuff relies on this.. and what
>>> are "whites" then? (from the word breaking point of view)... eg ' is NOT
>>> white. ( is. etc....
>>>
>>>
>>
>> Well, word breaking has nothing to do with whites (well, they happen to
>> be in a class that separates words, but that's it). I wouldn't change
>> the word next/prev functions themselves, I'd just change the way they
>> are used in edje/elm. I.e, something like:
>>
>> If (is_white(cur_char))
>> {
>>      skip_whites;
>>      skip_word;
>> }
>> else
>> {
>>      skip_word;
>>      skip_whites;
>> }    
>
> and therein lies the rub. "what is a white" when it comes to word separattion
> assuming white == separator = eg " ", "/t", "\n", ")", "(", "." etc.

That depends on the behaviour you'd like to implement. You are not 
relying on word separation anyway when you are doing what you are doing, 
you are doing something beyond that scope. You have to decide what you'd 
like to remove when you do it. If you'd like to remove all the spaces, 
tabs and ",", implement that, otherwise, do something else. It's up to you.

--
Tom.


------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
enlightenment-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Reply via email to