Hi Ayush,

I am not proficient in dealing with Unicode characters but don't you need to 
use something like wchar?Are you sure that a char for storing utf-8 is enough? 
Don't we need to store utf-16 characters?

Regarding your implementation, I have two comments.
The first one is that you are using a char array of limited length.
Why don't you use a string that can handle its own length better?
The second is the very long if statements.
You can add all the characters to an array and use a for loop. A theoritically 
better solution would be to use a set 
(http://www.cplusplus.com/reference/set/set/) that can be faster for doing 
searches.

Regards,
Amr


On Mar 14, 2020 8:35 PM, Ayush <ayush.pradhan2...@vitbhopal.ac.in> wrote:

Dear Sir/Ma’am,

This is to inform you that, I have completed with the robust tokenisation 
coding challenge with the help of Lu (Letter, uppercase), Ll (Letter, 
lowercase), Lm (Letter, modified) and Lo(Letter, others), in Unicode as 
alphabetic character and non-alphabetic otherwise.

The link to the solution is given below,

https://github.com/git-ayush-pradhan/Apertium_gsoc

Thanks and Regards,

Ayush Pradhan



_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to