Re: [algogeeks] How to read multiple utf-8 encoded string from stdin

Pradeep Dubey Tue, 26 Nov 2013 06:28:11 -0800

is this what you are asking:

#include<stdio.h>
#include<string.h>


/* MASK to check the numebr of BYTES character is using */
#define ASCII_BYTE 0x80
#define TWO_BYTE 0xC0
#define THREE_BYTES 0xE0
#define FOUR_BYTES 0xF0
#define FIVE_BYTES 0xF8
#define SIX_BYTES 0xFC

#define MASK_BYTE 0xFF
#define MAX_BUFF 200

int non_ascii_count(char arr[]){
       unsigned int  non_ascii = 0, count = 0,num=0;
       char *ch = arr;
       while (*ch != '\0') {
               num = (unsigned int)(*ch) ;
/* Only one last byte of the uint is required */
num = num & MASK_BYTE;
/* Check for multi-byte only if its not an ASCII, val < 128 */
               if (num > ASCII_BYTE ) {
/* Is a Non ASCII */
count = 0;
if (num < TWO_BYTE) {
count = 2;
} else if (num < THREE_BYTES) {
count = 3;
} else if (num < FOUR_BYTES) {
count = 4;
} else if (num < FIVE_BYTES) {
count = 5;
} else if (num < SIX_BYTES) {
count = 6;
}
/* Increment nonascii count and char pointer accordingly */
non_ascii++;
ch+=count;
}
/* ASCII , increment by one only */
               ch++;
       }
return non_ascii;
}

int main(void)
{
FILE* fd = stdin;
char buff[MAX_BUFF + 2]; /* 2 Extra for \0 & \n */
memset(buff,0,sizeof(buff));
/* fgets reads max one less than provided length so adding 1 */
while (NULL != fgets(buff,MAX_BUFF+1,fd))
printf("%d\n", non_ascii_count(buff));
return 0;
}


On Tue, Nov 26, 2013 at 7:43 PM, Nishant Pandey <
[email protected]> wrote:

> this way only helps in linux but when i use in windows with utf-8 encoded
> input file for reading characters i cant do it , secondly how to count non
> ascii characters from utf-8 string , any one is having any idea on this ?
>
>
> On Mon, Nov 25, 2013 at 11:50 AM, Karthikeyan V.B <[email protected]>wrote:
>
>>
>>
>>   From StackOverflow,
>>
>> -------------------------------
>>
>> fgets() can decode UTF-8 encoded files if you use Visual Studio 2005 and
>> up. Change your code like this:
>>
>>
>> infile = fopen(inname, "r, ccs=UTF-8");
>>
>>
>>
>> On Sat, Nov 23, 2013 at 8:25 PM, Nishant Pandey <
>> [email protected]> wrote:
>>
>>> Q) *C program* that reads multiple UTF-8 encoded strings from STDIN (1
>>> string per line), count all *non-ascii* characters (ascii characters
>>> are with ordinal decimal 0 to 127) and print the total non-ascii character
>>> count to STDOUT (1 number per line).
>>>
>>> Contraint :
>>>
>>>
>>>    - You cannot use any *wchar.h* service in your program.
>>>    - The UTF-8 strings supplied to you can have *1 or more whitespaces* in
>>>    them.
>>>    - No input string will have a character length greater than*200 
>>> *(including
>>>    spaces)
>>>    - You will be given multiple lines of input (1 string per line).
>>>    - Input will be limited to UTF-8 encoded strings and will not
>>>    contain any garbage values.
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "Algorithm Geeks" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "Algorithm Geeks" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Algorithm Geeks" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
>



-- 
Regards,
Pradeep

-- 
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].

Re: [algogeeks] How to read multiple utf-8 encoded string from stdin

Reply via email to