[Ankur-core] Picking between ত্ and ৎ
Hi, Right now which one is considered standard ত্ or ৎ? I mean I have seen plenty of websites with বিদ্যুত্ and বিদ্যুৎ, চিত্কার and চিৎকার। I need need to pick one as a standard for Apertium. In case of Bengali to English part, we could accept both but when generating from English to Bengali, we need to generate one. Once again and thanks in advance. -- Regards Abu Zaher Md. Faridee http://zaher14.blogspot.com/ http://sourceforge.net/projects/apertium/ --- Time heals every wound, but time itself is a wound that never heals. -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Bengalinux-core mailing list Bengalinux-core@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bengalinux-core
Re: [Ankur-core] Picking between ত্ and ৎ
I just had a talk regarding this with Golam Mortaza Bhai, pasting that for future references :) (05:52:23 PM) zahe...@gmail.com/HomeC8631CA7: I've mailed you regarding an issue betten 'ত্ and 'ৎ', if you get the time, plase feel free to answer (05:52:25 PM) Golam Mortuza Hossain: I mean I got (05:52:30 PM) zahe...@gmail.com/HomeC8631CA7: cool (05:52:34 PM) Golam Mortuza Hossain: Please (05:52:42 PM) Golam Mortuza Hossain: follow ৎ (05:53:26 PM) Golam Mortuza Hossain: Khanda-Ta as a separate glyph is now Unicode standard (05:54:03 PM) Golam Mortuza Hossain: which wasn't the case earlier (05:54:41 PM) zahe...@gmail.com/HomeC8631CA7: I was following ৎ all this time, but came across some sites that have ত্ and the fact that in unicode character set ৎ has a comment like this a dead consonant form of ta, without implicit vowel, used in some sequences, that why I thought I consult you (05:55:48 PM) Golam Mortuza Hossain: the reason for this, earlier there was no glyph for Khanda-Ta in Unicode (05:55:59 PM) zahe...@gmail.com/HomeC8631CA7: yeah I know (05:57:03 PM) Golam Mortuza Hossain: If you want to make it backward compatible then (05:57:23 PM) Golam Mortuza Hossain: you could consider mapping ত্ (05:57:31 PM) Golam Mortuza Hossain: to ৎ (05:57:40 PM) Golam Mortuza Hossain: But it could be tricky (05:58:57 PM) zahe...@gmail.com/HomeC8631CA7: yeah (05:59:07 PM) zahe...@gmail.com/HomeC8631CA7: I know, I tried a bit (05:59:36 PM) Golam Mortuza Hossain: :-) (06:01:17 PM) zahe...@gmail.com/HomeC8631CA7: we might need to build a table for that, for eg. ত্ক - ৎক its always like that isn't it, but we can't map like it in উত্তর (06:01:36 PM) zahe...@gmail.com/HomeC8631CA7: so we might need a to check all these :( (06:02:32 PM) Golam Mortuza Hossain: If I remember correctly then sometime people also (06:02:42 PM) Golam Mortuza Hossain: used ZWNJ after Halant (06:02:51 PM) zahe...@gmail.com/HomeC8631CA7: yeah (06:03:03 PM) zahe...@gmail.com/HomeC8631CA7: I've seen that too (06:03:21 PM) Golam Mortuza Hossain: this case should be easy (06:04:30 PM) Golam Mortuza Hossain: also when it appears just before , , :, ।, ?, etc. (06:04:44 PM) zahe...@gmail.com/HomeC8631CA7: am alreay running the source text through a normalizer right now, becase ড় - ড + nukta, we sometimes get text in the complex form and the parser gets confused (06:04:54 PM) zahe...@gmail.com/HomeC8631CA7: aha (06:05:23 PM) Golam Mortuza Hossain: yeah I see (06:06:50 PM) zahe...@gmail.com/HomeC8631CA7: so you think its do-able right? (06:07:22 PM) Golam Mortuza Hossain: no (06:07:52 PM) zahe...@gmail.com/HomeC8631CA7: btw, could I paste this conversation in the group just as a reference for the others? (06:09:11 PM) Golam Mortuza Hossain: In some cases unambiguous mapping may not be possible (06:09:16 PM) Golam Mortuza Hossain: Yeah, sure (06:13:37 PM) Golam Mortuza Hossain: My suggestion would be handle only ৎ in the engine. (06:15:28 PM) Golam Mortuza Hossain: If needed then mapping should be done in text pre-parser. (06:16:21 PM) Golam Mortuza Hossain: In the long term ত্ appearance will go away! (06:16:30 PM) zahe...@gmail.com/HomeC8631CA7: I agree -- Regards Abu Zaher Md. Faridee http://zaher14.blogspot.com/ http://sourceforge.net/projects/apertium/ --- Time heals every wound, but time itself is a wound that never heals. -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Bengalinux-core mailing list Bengalinux-core@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bengalinux-core
Re: [Ankur-core] Ankur Abhidhan project needs help!
Dear Golam Bhai, For my GSoC project, I'll also need to work on a English to Bengali dictionary (pos tagged), which will only hold the references of the lemmata. Currently I'm busy on the Bengali morphological generation part, but after I finish that I can move to Dictionary part. I'll contact and Jamil bhai soon. On Tue, Jun 9, 2009 at 3:03 PM, Jamil Ahmed itsja...@gmail.com wrote: Dear Golam bhai, I will check and let you know soon. :) Regards, -Jamil 2009/6/7 Golam Mortuza Hossain gmhoss...@gmail.com Hi All, Ankur English to Bengali dictionary project [1] has been serving increasingly more and more users for quite some time. According to Google analytics, Ankur E2B dictionary project has served more than sixty thousands request in last month alone [2]. It has also lead to increased contributions. Unfortunately, the project is lacking man-power to keep up with the increased demand. Also, I am unable to give enough time to the project lately and I don't see my situation is changing anytime soon. Consequently, large numbers of contributed entries remain unedited [3]. So I am now seeking opinions from Ankur members to sustain the project meaningfully. I would be happy to make personal request to anyone who might be interested in helping the project by any means. In case, you know of someone either from Ankur or outside, who could help in this regard, then please let me know. It may be helpful to forward this request to any other interested groups. [1] http://www.bengalinux.org/english-to-bengali-dictionary/ [2] http://www.bengalinux.org/english-to-bengali-dictionary/VisitorsOverviewReport.pdf [3] http://www.bengalinux.org/cgi-bin/abhidhan/statistics.pl Cheers, Golam -- Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects ___ Bengalinux-core mailing list Bengalinux-core@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bengalinux-core -- Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects ___ Bengalinux-core mailing list Bengalinux-core@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bengalinux-core -- Regards Abu Zaher Md. Faridee http://zaher14.blogspot.com/ --- Time heals every wound, but time itself is a wound that never heals. -- Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects ___ Bengalinux-core mailing list Bengalinux-core@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bengalinux-core
Re: [Ankur-core] XML standard for Ankur's Abhidhan
You might also find it helpful to look at apertium dictionary format, which is also standard XML. Here is the link to svn for Nepalese Language (its the closest language to Bengali in apertium we have so far, and the Bengali pair is far from finished :( ) http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-bn-en/. I have been working to find some standard tag sets for Bengali language, so far I'm also doing away with pen treebank tagsets, but I the future I might need to extend those, as for my project requirements. *However, I bellive penn treebank tagset to be sufficient for a general purpose dictionary format.* The attached file contains the Pen Treebank Tagset and also the bilingual ductioanry format from apertium. What I'd like to propose is instead of using pos_tagVerb, non-3rd person singular present/ pos_tag you could create some definitions like verb, person, number, tense and then use them as the property for the specific entry. I'd be easier to parse in the future. On Wed, May 13, 2009 at 8:02 AM, Golam Mortuza Hossain gmhoss...@gmail.comwrote: Hi, On Tue, May 12, 2009 at 5:13 PM, Salahuddin Pasha salahuddi...@gmail.com wrote: Basic work is already done, but we need to define a standard XML (XML DTD or XML Schema). Example: test XML output. ?xml version=1.0 encoding=utf-8? dictionary search_results dict_entry id=1 en_wordread/en_word pos_tagNoun, singular or mass/pos_tag Thanks a lot for your work. I should suggest that you also try to have an entry for PennTag for Parts-of-Speech (pos) like NN, VV etc. So something like penn_tagNN/penn_tag This would be needed if Anubadok Online intreface needs to update its database using your XML gateway of Ankur dictionary database. Cheers, Golam -- The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your production scanning environment may not be a perfect world - but thanks to Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 Series Scanner you'll get full speed at 300 dpi even with all image processing features enabled. http://p.sf.net/sfu/kodak-com ___ Bengalinux-core mailing list Bengalinux-core@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bengalinux-core -- Regards Abu Zaher Md. Faridee http://zaher14.blogspot.com/ --- Time heals every wound, but time itself is a wound that never heals. -- The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your production scanning environment may not be a perfect world - but thanks to Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 Series Scanner you'll get full speed at 300 dpi even with all image processing features enabled. http://p.sf.net/sfu/kodak-com___ Bengalinux-core mailing list Bengalinux-core@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bengalinux-core