Hi Ayush, I am not proficient in dealing with Unicode characters but don't you need to use something like wchar?Are you sure that a char for storing utf-8 is enough? Don't we need to store utf-16 characters?
Regarding your implementation, I have two comments. The first one is that you are using a char array of limited length. Why don't you use a string that can handle its own length better? The second is the very long if statements. You can add all the characters to an array and use a for loop. A theoritically better solution would be to use a set (http://www.cplusplus.com/reference/set/set/) that can be faster for doing searches. Regards, Amr On Mar 14, 2020 8:35 PM, Ayush <ayush.pradhan2...@vitbhopal.ac.in> wrote: Dear Sir/Ma’am, This is to inform you that, I have completed with the robust tokenisation coding challenge with the help of Lu (Letter, uppercase), Ll (Letter, lowercase), Lm (Letter, modified) and Lo(Letter, others), in Unicode as alphabetic character and non-alphabetic otherwise. The link to the solution is given below, https://github.com/git-ayush-pradhan/Apertium_gsoc Thanks and Regards, Ayush Pradhan
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff