Re: [Ankur-core] XML standard for Ankur's Abhidhan

2009-05-14 Thread Salahuddin Pasha

On May 13, 2009, at 10:57 PM, Deepayan Sarkar wrote:

 On 5/12/09, Salahuddin Pasha salahuddi...@gmail.com wrote:
 Dear all,

 I was working on অভিধান - Abhidhan for XML support.  To
 enable various application and tools to utilize our dictionary.

 Basic work is already done, but we need to define a standard XML (XML
 DTD or XML Schema).

 Any suggestion or comments ?

 Back in 2003, the bengalinux dictionary list had a discussion on this.
 Nothing ever came out of it, and when Golam first started on anubadok,
 his emphasis was more specialized. In any case, that discussion may
 provide some suggestions.

 You can get it from the list archives, and I'm also attaching a
 cleaned up and edited version of the thread here:

 ...
 
   
 From: Kaushik Ghose kgh...@wa... -  2003-05-16 15:07
   
 
  ?xml version=1.0?
  !ELEMENT dictionary (entry*)
  !ELEMENT entry (word, info*) 
  !ELEMENT word (#CDATA)
  !ELEMENT info (refer?,pron?, synonym?,antonym?,meaning?,grammar?)
  !ATTLIST info pos (n|adj|v|adv) n plural (true|false) false  
 origin
  CDATA #DEFAULT  date CDATA
  !ELEMENT refer  (#CDATA)
  !ELEMENT pron  (#CDATA)
  !ELEMENT synonym (#CDATA)
  !ATTLIST synonym lang CDATA #DEFAULT bn
  !ELEMENT antonym (#CDATA)
  !ATTLIST antonym lang CDATA #DEFAULT bn
  !ELEMENT meaning (#CDATA)
  !ATTLIST meaning lang CDATA #DEFAULT bn
  !ELEMENT grammar (derivative?)
  !ELEMENT derivative (#CDATA)
  !ATTLIST derivative form (the|of) the num (singular|plural)  
 singular


  also, to answer Deepayan's question by date I was thinking of date of
  origin, first use etc.

  Will potter with QT

  right now, I'm goign to hardcode the DTD structure, I can't think  
 of a
  simple way of creating an editor that will parse the DTD and  
 configure the
  GUI on the fly - fixed boxes for all teh element will be quicker  
 for this
  size DTD

  PS. try the perl tool at
  http://www.sagehill.net/livedtd/download.html

  -kg  
   

 /thread





Dear Deepayan bhai,

  Thank you for your mail.


Here is the present updated one example:

?xml version=1.0 encoding=utf-8?
dictionary
search_results
dict_entry
bdict_id68218/bdict_id
en_wordapple/en_word
pos_tagProper noun, singular/pos_tag
penn_tagNP/penn_tag
   bn_pronunciation/bn_pronunciation
   en_leema/en_leema
bn_wordঅ্যাপল/bn_word
explanation/explanation
exampleউদাঃ/example
statusEDITED/status
/dict_entry
/search_results
/dictionary




 From Deepayan bhai's mail. I think we still need to add these fields.
We will add this in later version as we do not have enough information  
for these fields now.

origin=deshi
synonyms.../synonyms
antonyms.../antonyms

entry
info pos=noun plural=false origin=deshi
  synonyms.../synonyms
  antonyms.../antonyms
/info
/entry

  grammar
derivative form=thechhaanaaTaa,chhaanaaTi/derivative
derivative form=ofnum=singularchhaanaaTir/derivative
derivative form=of num=pluralchhaanaader/derivative
/grammar





Another questions is which would better for us ?

use grammer tag and store information in nested tags or  the palin  
one  in the present updated one.





regards
salahuddin
--
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com
___
Bengalinux-core mailing list
Bengalinux-core@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bengalinux-core


Re: [Ankur-core] XML standard for Ankur's Abhidhan

2009-05-13 Thread Abu Zaher
You might also find it helpful to look at apertium dictionary format, which
is also standard XML. Here is the link to svn for Nepalese Language (its the
closest language to Bengali in apertium we have so far, and the Bengali pair
is far from finished :( )
http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-bn-en/.

I have been working to find some standard tag sets for Bengali language, so
far I'm also doing away with pen treebank tagsets, but I the future I might
need to extend those, as for my project requirements. *However, I bellive
penn treebank tagset to be sufficient for a general purpose dictionary
format.*

The attached file contains the Pen Treebank Tagset and also the bilingual
ductioanry format from apertium.

What I'd like to propose is instead of using pos_tagVerb, non-3rd person
singular present/
pos_tag you could create some definitions like verb, person, number, tense
and then use them as the property for the specific entry. I'd be easier to
parse in the future.

On Wed, May 13, 2009 at 8:02 AM, Golam Mortuza Hossain
gmhoss...@gmail.comwrote:

 Hi,

 On Tue, May 12, 2009 at 5:13 PM, Salahuddin Pasha
 salahuddi...@gmail.com wrote:
  Basic work is already done, but we need to define a standard XML (XML
  DTD or XML Schema).
  Example: test XML output.
 
  ?xml version=1.0 encoding=utf-8?
  dictionary
search_results
dict_entry id=1
en_wordread/en_word
pos_tagNoun, singular or mass/pos_tag


 Thanks a lot for your work.

 I should suggest that you also try to have an entry for PennTag
 for Parts-of-Speech (pos) like NN, VV etc. So something like

 penn_tagNN/penn_tag

 This would be needed if Anubadok Online intreface needs to update its
 database using your XML gateway of Ankur dictionary database.

 Cheers,
 Golam


 --
 The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
 production scanning environment may not be a perfect world - but thanks to
 Kodak, there's a perfect scanner to get the job done! With the NEW KODAK
 i700
 Series Scanner you'll get full speed at 300 dpi even with all image
 processing features enabled. http://p.sf.net/sfu/kodak-com
 ___
 Bengalinux-core mailing list
 Bengalinux-core@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bengalinux-core




-- 
Regards
Abu Zaher Md. Faridee

http://zaher14.blogspot.com/
---
Time heals every wound, but time itself is a wound that never heals.
--
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com___
Bengalinux-core mailing list
Bengalinux-core@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bengalinux-core


Re: [Ankur-core] XML standard for Ankur's Abhidhan

2009-05-13 Thread Deepayan Sarkar
On 5/12/09, Salahuddin Pasha salahuddi...@gmail.com wrote:
 Dear all,

  I was working on অভিধান - Abhidhan for XML support.  To
  enable various application and tools to utilize our dictionary.

  Basic work is already done, but we need to define a standard XML (XML
  DTD or XML Schema).

  Any suggestion or comments ?

Back in 2003, the bengalinux dictionary list had a discussion on this.
Nothing ever came out of it, and when Golam first started on anubadok,
his emphasis was more specialized. In any case, that discussion may
provide some suggestions.

You can get it from the list archives, and I'm also attaching a
cleaned up and edited version of the thread here:

thread from May 2003



[Ankur-dictionary] dictionary.dtd
From: Kaushik Ghose kgh...@wa... -2003-05-14 04:17



  Hi,
  here is the descriptor file.
  I'm new to XML and DTDs so please go over the semantics as well as the
  syntax an see if this serves our purpose...


  ?xml version=1.0?
  !ELEMENT entry*(word_bn, info_bn*)
  !ELEMENT word_bn (#CDATA)
  !ELEMENT info_bn (english, pronounciation_bn,meaning_bn)
  !ELEMENT english  (#CDATA)
  !ELEMENT pronounciation_bn  (#CDATA)
  !ELEMENT meaning_bn  (#CDATA)

  thanks
  -kg   



From: Kaushik Ghose kgh...@wa... -2003-05-14 05:12


 Ok, small correction, QTs DOM class seems to parse this correctly

  dictionary.dtd

  ?xml version=1.0?
  !ELEMENT dictionary (entry*)
  !ELEMENT entry (word_bn, info_bn*) 
  !ELEMENT word_bn (#CDATA)
  !ELEMENT info_bn (english?, pronounciation_bn?,meaning_bn?)
  !ELEMENT english  (#CDATA)
  !ELEMENT pronounciation_bn  (#CDATA)
  !ELEMENT meaning_bn  (#CDATA)


  test.xml

  ?xml version=1.0?
  !DOCTYPE entry SYSTEM dictionary.dtd
  dictionary
  entry
  word_bn?   ???/word_bn
  info_bn
  englishseedling/english
  pronounciation_bnankur/pronounciation_bn
  meaning_bn???   ???
  ??
  ??/meaning_bn
  /info_bn
  /entry

  entry
  word_bn?   ?/word_bn
  info_bn
  englishbangla/english
  pronounciation_bnbangla/pronounciation_bn
  meaning_bn???   ?
  ,?   ???
  ?   ?/meaning_bn
  /info_bn
  info_bn
  englishbengali/english
  /info_bn
  /entry
  /dictionary

  thanks
  -kg




From: Deepayan Sarkar deepa...@st... -2003-05-14 07:03


  Ha! A friend of mine once corrected me on this, now I can correct
  someone else :) 'pronounciation' should be spelled
  'pronunciation'.

  I'm not an expert on DTDs (though I know someone who knows much
  more, whom I can ask after after we make some progress). I find it
  very difficult to understand DTD's, and much easier to understand
  examples of what the final thing would look like. Let's work that
  way, and we can write out the DTD on ce we decide on the 'look'.

  I don't know if you know this, but there's something called
  attributes which might be useful. For instance, with multiple
  meanings as different parts of speech.  Here's an example (I'm using
  slightly different tags) --- 'pos' is part of speech, 'plural' is
  whether the word has a plural form, etc.:

  entry
wordchhaanaa/word
info pos=noun plural=false origin=deshi
  meaningdudh theke toiri ek dhoroner .../meaning
  synonyms.../synonyms
  antonyms.../antonyms## ???
translation lang=encottage cheese (?)/translation
pronunciationchhaanaa/pronunciation
/info
info pos=noun origin=tatbhabo  #it's probably not, but...
  meaningshishu, bachchaa/meaning
translation lang=enchild, young/translation  # comma 
separated
translation lang=hnbachcha/translation  #hindi is hn ? 
not sure
pronunciationchhaanaa/pronunciation
derivative form=thechhaanaaTaa, chhaanaaTi/derivative
derivative form=of num=singularchhaanaaTir/derivative
derivative form=of num=pluralchhaanaader/derivative
/info
  /entry

  (I've used romanized bengali in place of what should be bengali, but
  you get the idea.)

  I think we should handle derivative words here (and not have
  separate entries for them. 

[Ankur-core] XML standard for Ankur's Abhidhan

2009-05-12 Thread Salahuddin Pasha
Dear all,

I was working on অভিধান - Abhidhan for XML support.  To  
enable various application and tools to utilize our dictionary.

Basic work is already done, but we need to define a standard XML (XML  
DTD or XML Schema).

Any suggestion or comments ?


Example: test XML output.

?xml version=1.0 encoding=utf-8?
dictionary
   search_results
   dict_entry id=1
   en_wordread/en_word
   pos_tagNoun, singular or mass/pos_tag
   bn_wordপড়া/bn_word
   /dict_entry
   dict_entry id=2
   en_wordread/en_word
   pos_tagVerb, base form/pos_tag
   bn_wordপড়া/bn_word
   /dict_entry
   dict_entry id=3
   en_wordread/en_word
   bn_pronunciation উচ্চাঃ রীড/ 
bn_pronunciation
   pos_tagVerb, non-3rd person singular present/ 
pos_tag
   bn_wordপাঠ করা/bn_word
   /dict_entry
   /search_results
/dictionary

regards
salahuddin
--
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com
___
Bengalinux-core mailing list
Bengalinux-core@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bengalinux-core


Re: [Ankur-core] XML standard for Ankur's Abhidhan

2009-05-12 Thread Golam Mortuza Hossain
Hi,

On Tue, May 12, 2009 at 5:13 PM, Salahuddin Pasha
salahuddi...@gmail.com wrote:
 Basic work is already done, but we need to define a standard XML (XML
 DTD or XML Schema).
 Example: test XML output.

 ?xml version=1.0 encoding=utf-8?
 dictionary
       search_results
               dict_entry id=1
                       en_wordread/en_word
                       pos_tagNoun, singular or mass/pos_tag


Thanks a lot for your work.

I should suggest that you also try to have an entry for PennTag
for Parts-of-Speech (pos) like NN, VV etc. So something like

penn_tagNN/penn_tag

This would be needed if Anubadok Online intreface needs to update its
database using your XML gateway of Ankur dictionary database.

Cheers,
Golam

--
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com
___
Bengalinux-core mailing list
Bengalinux-core@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bengalinux-core