Hi Manoj,
    The format has been around for a long time.  Whereas I don’t think it 
predates XML, XML was probably not as ubiquitous as it is today.  However, it 
really should not be a stumbling point for you.  I believe all you need to do 
is read in the data and get the spans of the names.  One other point, OpenNLP 
has the concept of a dictionary.  Have you looked into 
openlp.tools.dictionary.Dictionary and 
openlp.tools.dictionary.DictionarySerializer?  It looks like you want to create 
a DictionarySerializer that can read your format.

   One last point, This question is probably better asked on the user 
listserve.  Most of the developers are subscribed to both the user and dev 
listserves.

Hope it helps,
Daniel


> On Jul 21, 2017, at 6:54 AM, Manoj B. Narayanan 
> <manojb.narayanan2...@gmail.com> wrote:
> 
> Hi Jim,
> Thanks for replying. Could you be more specific please.
> 
> These are the things that I am aware of:
> 1. The training data can be of the form  <START:person> Pierre Vinken <END>
> is a good example .
> 2. Currently I use a file in the below format and create a 'Dictionary'
> from it.
>    This is the format
> 
> <entry><token>vinayak</token></entry>
>> 
>> <entry><token>rakesh</token></entry>
>> 
>> <entry><token>sandeep</token></entry>
>> 
>> <entry><token>manoj</token></entry>
>> 
>> 
> And use this dictionary in the DictionaryNameFinder.
> 
> I would like to know the advantages of using this format. Is there any
> other formats available?
> 
> Could you please explain more.
> 
> Thanks.
> Manoj
> 
> On Fri, Jul 21, 2017 at 3:56 PM, Jim O'Regan <jaore...@tcd.ie> wrote:
> 
>> 2017-07-19 10:48 GMT+01:00 Manoj B. Narayanan <
>> manojb.narayanan2...@gmail.com>:
>> 
>>> Hi all,
>>> 
>>> I wanted to find out if there is any specific reason behind using XML
>>> format for dictionaries for Name Finder.
>>> 
>> 
>> It's not XML. There is a very superficial similarity in the use of <>, but,
>> at a minimum
>> <START:person> Pierre Vinken <END>
>> would need to be something like
>> <name type="person"> Pierre Vinken </name>
>> and the whole document would need to be enclosed by a pair of tags.
>> 
>> 
>>> Also, is there any source from where we can get the documentation
>> regarding
>>> the dictionary formats for various tools (tokenizer, pos, name finder).
>>> 
>> 
>> The manual: https://opennlp.apache.org/docs/1.8.1/manual/opennlp.html
>> More specifically,
>> tokeniser:
>> https://opennlp.apache.org/docs/1.8.1/manual/opennlp.
>> html#tools.tokenizer.training
>> pos:
>> https://opennlp.apache.org/docs/1.8.1/manual/opennlp.
>> html#tools.postagger.training
>> name finder:
>> https://opennlp.apache.org/docs/1.8.1/manual/opennlp.
>> html#tools.namefind.training
>> 

Reply via email to