[Ankur-core] Picking between ত্ and ৎ

2009-08-08 Thread Abu Zaher
Hi,

Right now which one is considered standard ত্ or ৎ? I mean I have seen
plenty of websites with বিদ্যুত্ and বিদ্যুৎ, চিত‍্কার and চিৎকার। I need
need to pick one as a standard for Apertium. In case of Bengali to English
part, we could accept both but when generating from English to Bengali, we
need to generate one.

Once again and thanks in advance.

-- 
Regards
Abu Zaher Md. Faridee

http://zaher14.blogspot.com/
http://sourceforge.net/projects/apertium/
---
Time heals every wound, but time itself is a wound that never heals.
--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Bengalinux-core mailing list
Bengalinux-core@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bengalinux-core


Re: [Ankur-core] Picking between ত্ and ৎ

2009-08-08 Thread Abu Zaher
I just had a talk regarding this with Golam Mortaza Bhai, pasting that for
future references :)

(05:52:23 PM) zahe...@gmail.com/HomeC8631CA7: I've mailed you regarding an
issue betten 'ত্ and 'ৎ', if you get the time, plase feel free to answer
(05:52:25 PM) Golam Mortuza Hossain: I mean I got
(05:52:30 PM) zahe...@gmail.com/HomeC8631CA7: cool
(05:52:34 PM) Golam Mortuza Hossain: Please
(05:52:42 PM) Golam Mortuza Hossain: follow ৎ
(05:53:26 PM) Golam Mortuza Hossain: Khanda-Ta as a separate glyph is now
Unicode standard
(05:54:03 PM) Golam Mortuza Hossain: which wasn't the case earlier
(05:54:41 PM) zahe...@gmail.com/HomeC8631CA7: I was following ৎ all this
time, but came across some sites that have ত্ and the fact that in unicode
character set ৎ has a comment like this a dead consonant form of ta,
without implicit vowel, used in some sequences, that why I thought I
consult you
(05:55:48 PM) Golam Mortuza Hossain: the reason for this, earlier there was
no glyph for Khanda-Ta in Unicode
(05:55:59 PM) zahe...@gmail.com/HomeC8631CA7: yeah I know
(05:57:03 PM) Golam Mortuza Hossain: If you want to make it backward
compatible then
(05:57:23 PM) Golam Mortuza Hossain: you could consider mapping ত্
(05:57:31 PM) Golam Mortuza Hossain: to ৎ
(05:57:40 PM) Golam Mortuza Hossain: But it could be tricky
(05:58:57 PM) zahe...@gmail.com/HomeC8631CA7: yeah
(05:59:07 PM) zahe...@gmail.com/HomeC8631CA7: I know, I tried a bit
(05:59:36 PM) Golam Mortuza Hossain: :-)
(06:01:17 PM) zahe...@gmail.com/HomeC8631CA7: we might need to build a table
for that, for eg. ত‍্ক - ৎক its always like that isn't it, but we can't map
like it in উত্তর
(06:01:36 PM) zahe...@gmail.com/HomeC8631CA7: so we might need a to check
all these :(
(06:02:32 PM) Golam Mortuza Hossain: If I remember correctly then sometime
people also
(06:02:42 PM) Golam Mortuza Hossain: used ZWNJ after Halant
(06:02:51 PM) zahe...@gmail.com/HomeC8631CA7: yeah
(06:03:03 PM) zahe...@gmail.com/HomeC8631CA7: I've seen that too
(06:03:21 PM) Golam Mortuza Hossain: this case should be easy
(06:04:30 PM) Golam Mortuza Hossain: also when it appears just before , ,
:, ।, ?,   etc.
(06:04:44 PM) zahe...@gmail.com/HomeC8631CA7: am alreay running the source
text through a normalizer right now, becase ড় - ড + nukta, we sometimes get
text in the complex form and the parser gets confused
(06:04:54 PM) zahe...@gmail.com/HomeC8631CA7: aha
(06:05:23 PM) Golam Mortuza Hossain: yeah I see
(06:06:50 PM) zahe...@gmail.com/HomeC8631CA7: so you think its do-able
right?
(06:07:22 PM) Golam Mortuza Hossain: no
(06:07:52 PM) zahe...@gmail.com/HomeC8631CA7: btw, could I paste this
conversation in the group just as a reference for the others?
(06:09:11 PM) Golam Mortuza Hossain: In some cases unambiguous mapping may
not be possible
(06:09:16 PM) Golam Mortuza Hossain: Yeah, sure
(06:13:37 PM) Golam Mortuza Hossain: My suggestion would be handle only ৎ
in the engine.
(06:15:28 PM) Golam Mortuza Hossain: If needed then mapping should be done
in text pre-parser.
(06:16:21 PM) Golam Mortuza Hossain: In the long term ত্ appearance will
go away!
(06:16:30 PM) zahe...@gmail.com/HomeC8631CA7: I agree

--
Regards
Abu Zaher Md. Faridee

http://zaher14.blogspot.com/
http://sourceforge.net/projects/apertium/
---
Time heals every wound, but time itself is a wound that never heals.
--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Bengalinux-core mailing list
Bengalinux-core@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bengalinux-core


Re: [Ankur-core] Ankur Abhidhan project needs help!

2009-06-09 Thread Abu Zaher
Dear Golam Bhai,

For my GSoC project, I'll also need to work on a English to Bengali
dictionary (pos tagged), which will only hold the references of the lemmata.
Currently I'm busy on the Bengali morphological generation part, but after I
finish that I can move to Dictionary part. I'll contact and Jamil bhai soon.

On Tue, Jun 9, 2009 at 3:03 PM, Jamil Ahmed itsja...@gmail.com wrote:

 Dear Golam bhai,

 I will check and let you know soon. :)

 Regards,
 -Jamil


 2009/6/7 Golam Mortuza Hossain gmhoss...@gmail.com

  Hi All,
 
  Ankur English to Bengali dictionary project [1] has been serving
  increasingly more and more users for quite some time.  According
  to Google analytics, Ankur E2B dictionary project has served
  more than sixty thousands request in last month alone [2].
 
  It has also lead to increased contributions.  Unfortunately,
  the project is lacking man-power to keep up with the
  increased demand. Also, I am unable to give enough time
  to the project lately and I don't see my situation is changing
  anytime soon. Consequently, large numbers of contributed
  entries remain unedited [3].
 
  So I am now seeking opinions from Ankur members to sustain
  the project meaningfully.
 
  I would be happy to make personal request to anyone who
  might be interested in helping the project by any means.
  In case, you know of someone either from Ankur or outside,
  who could help in this regard, then please let me know.
  It may be helpful to forward this request to any other
  interested groups.
 
 
  [1]  http://www.bengalinux.org/english-to-bengali-dictionary/
  [2]
 
 http://www.bengalinux.org/english-to-bengali-dictionary/VisitorsOverviewReport.pdf
  [3]  http://www.bengalinux.org/cgi-bin/abhidhan/statistics.pl
 
  Cheers,
  Golam
 
 
 
 --
  Crystal Reports - New Free Runtime and 30 Day Trial
  Check out the new simplified licensing option that enables unlimited
  royalty-free distribution of the report engine for externally facing
  server and web deployment.
  http://p.sf.net/sfu/businessobjects
  ___
  Bengalinux-core mailing list
  Bengalinux-core@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/bengalinux-core
 

 --
 Crystal Reports - New Free Runtime and 30 Day Trial
 Check out the new simplified licensing option that enables unlimited
 royalty-free distribution of the report engine for externally facing
 server and web deployment.
 http://p.sf.net/sfu/businessobjects
 ___
 Bengalinux-core mailing list
 Bengalinux-core@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bengalinux-core




-- 
Regards
Abu Zaher Md. Faridee

http://zaher14.blogspot.com/
---
Time heals every wound, but time itself is a wound that never heals.
--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Bengalinux-core mailing list
Bengalinux-core@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bengalinux-core


Re: [Ankur-core] XML standard for Ankur's Abhidhan

2009-05-13 Thread Abu Zaher
You might also find it helpful to look at apertium dictionary format, which
is also standard XML. Here is the link to svn for Nepalese Language (its the
closest language to Bengali in apertium we have so far, and the Bengali pair
is far from finished :( )
http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-bn-en/.

I have been working to find some standard tag sets for Bengali language, so
far I'm also doing away with pen treebank tagsets, but I the future I might
need to extend those, as for my project requirements. *However, I bellive
penn treebank tagset to be sufficient for a general purpose dictionary
format.*

The attached file contains the Pen Treebank Tagset and also the bilingual
ductioanry format from apertium.

What I'd like to propose is instead of using pos_tagVerb, non-3rd person
singular present/
pos_tag you could create some definitions like verb, person, number, tense
and then use them as the property for the specific entry. I'd be easier to
parse in the future.

On Wed, May 13, 2009 at 8:02 AM, Golam Mortuza Hossain
gmhoss...@gmail.comwrote:

 Hi,

 On Tue, May 12, 2009 at 5:13 PM, Salahuddin Pasha
 salahuddi...@gmail.com wrote:
  Basic work is already done, but we need to define a standard XML (XML
  DTD or XML Schema).
  Example: test XML output.
 
  ?xml version=1.0 encoding=utf-8?
  dictionary
search_results
dict_entry id=1
en_wordread/en_word
pos_tagNoun, singular or mass/pos_tag


 Thanks a lot for your work.

 I should suggest that you also try to have an entry for PennTag
 for Parts-of-Speech (pos) like NN, VV etc. So something like

 penn_tagNN/penn_tag

 This would be needed if Anubadok Online intreface needs to update its
 database using your XML gateway of Ankur dictionary database.

 Cheers,
 Golam


 --
 The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
 production scanning environment may not be a perfect world - but thanks to
 Kodak, there's a perfect scanner to get the job done! With the NEW KODAK
 i700
 Series Scanner you'll get full speed at 300 dpi even with all image
 processing features enabled. http://p.sf.net/sfu/kodak-com
 ___
 Bengalinux-core mailing list
 Bengalinux-core@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bengalinux-core




-- 
Regards
Abu Zaher Md. Faridee

http://zaher14.blogspot.com/
---
Time heals every wound, but time itself is a wound that never heals.
--
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com___
Bengalinux-core mailing list
Bengalinux-core@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bengalinux-core