Re: The management of the encoding process of emoji

2017-07-07 Thread William_J_G Overington via Unicode
An issue that seems to be coming into prominence is that as a result of the 
requirement that emoji proposals should not be overly specific, some recent 
proposals seem to be trying to emphasise that they are not overly specific by 
suggesting that the particular emoji proposed could mean various things.

This seems to present increasing ambiguity of meaning.

http://unicode.org/emoji/selection.html#Specific

Now, the overly in overly specific is rather subjective in its interpretation.

Yet is the pendulum swinging too far the other way perhaps?

Some readers may already know of the following video from the Unicode 39 
Conference in 2015.

https://www.youtube.com/watch?v=9ldSVbXbjl4

William Overington

Friday 7 July 2017



Re: Unicode education in the professional world

2017-07-07 Thread Philippe Verdy via Unicode
2017-07-07 19:02 GMT+02:00 Doug Ewell via Unicode :

> Oracle FAQ:

> While UTF8 uses only 2 bytes to store data AL32UTF8 uses 2 or 4 bytes.
>
> Unicode and UTF-8 have been around a long time by now. The fact that
> there is still fake news like this out there, steering our less
> Unicode-aware colleagues waaay down the wrong path, is disconcerting.
>

Well, these are old archived docs that have not been corrected since long.
FAQ's are rarely reviewed once published and frequently become obsolete
when they suggest old solutions for problems that no longer exist, or old
bad workarounds with their known caveats. They were designed only for
specific software versions and kept as is because newer versions are
documented elsewhere (but older versions may still be in use). The
situation is even worse in "community pages": their interest move over time
to something else and noone in these communities have a dedicated mandatory
task to review old documents made by others, no one leads them or can order
them what to to in a scheduled time.


Re: Unicode education in UK Schools

2017-07-07 Thread Asmus Freytag via Unicode
I performed a quick search "Informatik und Unicode" to see whether I 
could find documents from German academic institutions discussing 
Unicode in the context of computer science (Informatik).


Among the first page of search results I found a number of summaries and 
presentations that may have been (or possibly are) usable as 
introductory lectures.


One item looked like it could have been intended as source material for 
secondary schools rather than for use in the University.


I also checked whether there are accessible homework assignments that 
mention Unicode ("Hausaufgabe Unicode"). I didn't go very deep, but it 
seems that it's not untypical to relegate Unicode to a sidebar, 
explaining the "\u" notation and mentioning that you get ASCII if you 
set the upper byte to 0 (in a UTF-16 string, as supported by Java etc.).


I've not (yet) located any assignments that try to address any of the 
"tricky" issues in the use of Unicode.


A./


On 7/7/2017 2:02 AM, Andre Schappo via Unicode wrote:


There is some evidence that Unicode is now being introduced to 
Computer Science pupils in UK Schools. Hove Park School give a summary 
of their Computer Science curriculum for Years 8 and 9 
http://www.hovepark.brighton-hove.sch.uk/department/computer-science


From Year 9 curriculum summary:  "• Students code text into binary 
using ASCII and understand the limitations of this and the need for 
Unicode"


I think it unlikely they give much coverage of Unicode at Hove Park 
School but it is a promising start. Personally I am much encouraged, 
as Computer Science education in the UK, at all levels, continues to 
be dominated by ASCII.


…and…

as part of my continuing endeavours to get Computer Science/IT/ICT 
Internationalization on the School/College/University curricula I 
recently setup a google discussion forum 
https://groups.google.com/forum/#!forum/computer-science-curriculum-internationalization 
 If 
you know of any academics who might be interested please do let them 
know of this new forum. Unicode is, of course, a fundamental building 
block for internationalization and so should feature prominently in 
Computer Science teaching, at all levels.


André Schappo





Unicode education in the professional world

2017-07-07 Thread Doug Ewell via Unicode
Sort of along the lines of "education"...

I've been helping a colleague who is using the Oracle database and
trying to work through a customer's character conversion and mojibake
issues. I started suspecting the NLS_LANG variable and looked up some
references, and found the following alternative facts on the Oracle FAQ
and community pages:

> SQL> SELECT DUMP(col,1016)FROM table;
>
> Typ=1 Len=39 CharacterSet=UTF8: 227,131,143,227,131,170
>
> returns the value of a column consisting of 3 Japanese characters in
> UTF8 encoding . For example the 1st char is 227(*255)+131.

and:

> While UTF8 uses only 2 bytes to store data AL32UTF8 uses 2 or 4 bytes.

Unicode and UTF-8 have been around a long time by now. The fact that
there is still fake news like this out there, steering our less
Unicode-aware colleagues waaay down the wrong path, is disconcerting.

--
Doug Ewell | Thornton, CO, US | ewellic.org



Re: Unicode education in UK Schools

2017-07-07 Thread Asmus Freytag via Unicode

On 7/7/2017 12:55 PM, Doug Ewell via Unicode wrote:

Asmus Freytag wrote:


I've not (yet) located any assignments that try to address any of the
"tricky" issues in the use of Unicode.

That might be a good thing. Many introductory lessons or chapters or
talks about Unicode dive almost immediately into the complexities and
weirdnesses, much more so than with other technical topics. This scares
newbies and they walk away thinking every aspect of Unicode is complex
and weird.


For a CS curriculum you really want more than asking students to use 
Unicode to spell their name for a modified "Hello World!" program. (For 
a German university, this is an interesting assignment as at least half 
if not more of the students would be able to complete this assignment 
using the ASCII subset except for a small minority, the others would 
not actually need to use something like the \u syntax, as the local 
keyboard would work for their names).


Some of the presentations I found did mention collation and similar 
issues (and gave non-Latin examples) but I have not located any homework 
assignments that cover any of these issues (and they are not corner 
cases, but the ordinary complexity of text data).


A./


Re: Unicode education in UK Schools

2017-07-07 Thread Doug Ewell via Unicode
Asmus Freytag wrote:

> I've not (yet) located any assignments that try to address any of the
> "tricky" issues in the use of Unicode. 

That might be a good thing. Many introductory lessons or chapters or
talks about Unicode dive almost immediately into the complexities and
weirdnesses, much more so than with other technical topics. This scares
newbies and they walk away thinking every aspect of Unicode is complex
and weird.
 
--
Doug Ewell | Thornton, CO, US | ewellic.org



Unicode education in UK Schools

2017-07-07 Thread Andre Schappo via Unicode

There is some evidence that Unicode is now being introduced to Computer Science 
pupils in UK Schools. Hove Park School give a summary of their Computer Science 
curriculum for Years 8 and 9 
http://www.hovepark.brighton-hove.sch.uk/department/computer-science

From Year 9 curriculum summary:  "• Students code text into binary using ASCII 
and understand the limitations of this and the need for Unicode"

I think it unlikely they give much coverage of Unicode at Hove Park School but 
it is a promising start. Personally I am much encouraged, as Computer Science 
education in the UK, at all levels, continues to be dominated by ASCII.

…and…

as part of my continuing endeavours to get Computer Science/IT/ICT 
Internationalization on the School/College/University curricula I recently 
setup a google discussion forum 
https://groups.google.com/forum/#!forum/computer-science-curriculum-internationalization
 If you know of any academics who might be interested please do let them know 
of this new forum. Unicode is, of course, a fundamental building block for 
internationalization and so should feature prominently in Computer Science 
teaching, at all levels.

André Schappo



Re: Unicode education in UK Schools

2017-07-07 Thread William_J_G Overington via Unicode
Around 1991 I was shopping in a supermarket and I noticed some product that I 
was buying had its ingredients list in a lot of languages.

I have been interested in typography and languages since the 1960s. During the 
1960s I was given a copy of the Riscatype Accents Catalogue.

A page of particular interest had a list of the accented characters needed to 
typeset various languages of Europe. This was only of languages that used Latin 
script. Esperanto was in the list.

This list fascinated me.

For example, it mentioned the u diaeresis used in French, though I learned 
later that words that have a u diaeresis in French are rather rare.

There were the accents used for various Scandinavian languages. The various 
languages, if I remember correctly, each having a different selection of 
accented characters than the other Scandinavian languages.

I found that the character a tilde as I now know it to be called is only used 
in Portuguese. Some years later, in the early 1970s, two researchers were 
trying to translate a research paper using a Spanish dictionary and having 
great problems. I glanced at the text and said that it was not Spanish, it was 
Portuguese. I was asked if I spoke Portuguese and I replied that I did not and 
mentioned my interest in typography.

As I was saying, around 1991 I was shopping in a supermarket and I noticed some 
product that I was buying had its ingredients list in a lot of languages.

Thinking about this, I devised a scenario that I called The Café Äpfel.

https://forum.high-logic.com/viewtopic.php?p=5311#p5311

Around the same time I set up a roomful of PCs so that the start up page of 
each had text at the lower edge showing the sentence Good Day. in about six or 
seven languages.

There was Good Day, Bonjour, and German and Italian versions, Bonan Tagon which 
is Esperanto and one or two others. I sought advice from linguists for some of 
them. Fortunately the Esperanto version did not need any accented characters 
otherwise it would not have been possible to include it at that time.

Here is what I wrote about The Café Äpfel in 2006 in the above-linked 
High-Logic Forum post.

quote

Many years ago I devised a scenario to encourage people to learn how to enter 
words with accented characters in them even if they did not know the language. 
I called it The Café Äpfel and the idea was that text from ingredients lists 
from multilingual food packaging could be keyed. The Café Äpfel would have 
menus in English, French, German and the language of the musicians and singers 
who were performing in the café that evening. I had this idea of a television 
show series with each episode combining cookery, computing and music with 
actors playing the continuing characters and guest musicians and singers 
arriving as guest stars.

Well, a Portuguese band and singer would be fairly straightforward.

Once the musicians come from further afield the computing gets rather more 
complicated! :-)

end quote

So can the idea of The Café Äpfel be updated, extended so as to promote the use 
of Unicode, and applied to help with education?

For example, the original idea included a television series. Now there is 
widespread production of videos.

Previously I wrote:

> Once the musicians come from further afield the computing gets rather more 
> complicated! :-)

What if the musicians are from Latvia?

What if the musicians are from Bulgaria?

What if the musicians are from Japan?

What if the musicians are from  well, how about dividing the class into 
small groups and giving each group a language to investigate.

They could all use emoji as well if you like!

The whole exercise could take them beyond 7-bit to 8-bit, beyond 8-bit to 
16-bit, beyond 16-bit to 21-bit.

Grocery packaging, yes, but today there is the PanLex database too. 
https://www.panlex.org/

So how about as an exercise for the students to typeset the list of ingredients 
of a gluten-free vegetable stew.

There could be a list of several vegetables and the students could use the 
PanLex database and Google translate to look them up and then typeset the menu, 
making use of Unicode code charts to find the code point of each accented 
character and finding out about that character.

For example, the reason why a number of Central European languages each have a 
c caron in them. Some interesting history there.

The first exercises could use languages that only use 8-bit characters, so as 
to get started and some print outs produced.

Maybe French, German, Portuguese and Swedish.

I have tried looking for carrot in the PanLex website.

https://apps.panlex.org/panlinx/

https://apps.panlex.org/panlinx/gp/29

https://apps.panlex.org/panlinx/gp/29/sub/8581

https://apps.panlex.org/panlinx/ex/368537

That was fortunate, the Latvian word for carrot has an a macron in it.

So if The Café Äpfel is having musicians and singers from Latvia to perform, 
and the vegetable stew has carrots in it, the students need to get an a macron