[ngram] Re: Upper Half of ASCII Character Table

ted_pedersen Sat, 11 May 2013 10:23:31 -0700

There has been some previous discussion of encoding issues on the list, for 
example the thread which starts here :


http://tech.groups.yahoo.com/group/ngram/message/211

I'll dig around a little more and see what else I can find.

More soon,
Ted

--- In ngram@yahoogroups.com, "Dian Jia" <dianj_83@...> wrote:
>
> Hi there,
> 
> I would like to add the upper half of ASCII Character Table in NSP. I found 
> the following possible solution in the package. Any suggestions to add the 
> following to the latest version, which has been modified?
> 
> "Here's an idea (courtesy of Michal Kren) - you can make the following 
> modification to line 165 of count.pl (in v0.3):
> 
> while ( /(([\w\x80-\xff]+)|[,.!?;:])/g )
> 
> This will extend the "matching" for words to include ASCII characters
> numbered 127 to 256 (the upper half of the table). This includes a
> number of accented characters and other alphabets, so it might possibly
> include the characters you are interested in."
> 
> Thanks a lot
> Di
>

[ngram] Re: Upper Half of ASCII Character Table

Reply via email to