Re: Dictionary

Manoj B. Narayanan Fri, 21 Jul 2017 07:13:42 -0700

Thank You Dan !!!! :)


On Fri, Jul 21, 2017 at 7:13 PM, Dan Russ <danrus...@gmail.com> wrote:

> Hi Manoj,
>     The format has been around for a long time.  Whereas I don’t think it
> predates XML, XML was probably not as ubiquitous as it is today.  However,
> it really should not be a stumbling point for you.  I believe all you need
> to do is read in the data and get the spans of the names.  One other point,
> OpenNLP has the concept of a dictionary.  Have you looked into
> openlp.tools.dictionary.Dictionary and 
> openlp.tools.dictionary.DictionarySerializer?
> It looks like you want to create a DictionarySerializer that can read your
> format.
>
>    One last point, This question is probably better asked on the user
> listserve.  Most of the developers are subscribed to both the user and dev
> listserves.
>
> Hope it helps,
> Daniel
>
>
> > On Jul 21, 2017, at 6:54 AM, Manoj B. Narayanan <
> manojb.narayanan2...@gmail.com> wrote:
> >
> > Hi Jim,
> > Thanks for replying. Could you be more specific please.
> >
> > These are the things that I am aware of:
> > 1. The training data can be of the form  <START:person> Pierre Vinken
> <END>
> > is a good example .
> > 2. Currently I use a file in the below format and create a 'Dictionary'
> > from it.
> >    This is the format
> >
> > <entry><token>vinayak</token></entry>
> >>
> >> <entry><token>rakesh</token></entry>
> >>
> >> <entry><token>sandeep</token></entry>
> >>
> >> <entry><token>manoj</token></entry>
> >>
> >>
> > And use this dictionary in the DictionaryNameFinder.
> >
> > I would like to know the advantages of using this format. Is there any
> > other formats available?
> >
> > Could you please explain more.
> >
> > Thanks.
> > Manoj
> >
> > On Fri, Jul 21, 2017 at 3:56 PM, Jim O'Regan <jaore...@tcd.ie> wrote:
> >
> >> 2017-07-19 10:48 GMT+01:00 Manoj B. Narayanan <
> >> manojb.narayanan2...@gmail.com>:
> >>
> >>> Hi all,
> >>>
> >>> I wanted to find out if there is any specific reason behind using XML
> >>> format for dictionaries for Name Finder.
> >>>
> >>
> >> It's not XML. There is a very superficial similarity in the use of <>,
> but,
> >> at a minimum
> >> <START:person> Pierre Vinken <END>
> >> would need to be something like
> >> <name type="person"> Pierre Vinken </name>
> >> and the whole document would need to be enclosed by a pair of tags.
> >>
> >>
> >>> Also, is there any source from where we can get the documentation
> >> regarding
> >>> the dictionary formats for various tools (tokenizer, pos, name finder).
> >>>
> >>
> >> The manual: https://opennlp.apache.org/docs/1.8.1/manual/opennlp.html
> >> More specifically,
> >> tokeniser:
> >> https://opennlp.apache.org/docs/1.8.1/manual/opennlp.
> >> html#tools.tokenizer.training
> >> pos:
> >> https://opennlp.apache.org/docs/1.8.1/manual/opennlp.
> >> html#tools.postagger.training
> >> name finder:
> >> https://opennlp.apache.org/docs/1.8.1/manual/opennlp.
> >> html#tools.namefind.training
> >>
>
>

Re: Dictionary

Reply via email to