Thank You Dan !!!! :)
On Fri, Jul 21, 2017 at 7:13 PM, Dan Russ <danrus...@gmail.com> wrote: > Hi Manoj, > The format has been around for a long time. Whereas I don’t think it > predates XML, XML was probably not as ubiquitous as it is today. However, > it really should not be a stumbling point for you. I believe all you need > to do is read in the data and get the spans of the names. One other point, > OpenNLP has the concept of a dictionary. Have you looked into > openlp.tools.dictionary.Dictionary and > openlp.tools.dictionary.DictionarySerializer? > It looks like you want to create a DictionarySerializer that can read your > format. > > One last point, This question is probably better asked on the user > listserve. Most of the developers are subscribed to both the user and dev > listserves. > > Hope it helps, > Daniel > > > > On Jul 21, 2017, at 6:54 AM, Manoj B. Narayanan < > manojb.narayanan2...@gmail.com> wrote: > > > > Hi Jim, > > Thanks for replying. Could you be more specific please. > > > > These are the things that I am aware of: > > 1. The training data can be of the form <START:person> Pierre Vinken > <END> > > is a good example . > > 2. Currently I use a file in the below format and create a 'Dictionary' > > from it. > > This is the format > > > > <entry><token>vinayak</token></entry> > >> > >> <entry><token>rakesh</token></entry> > >> > >> <entry><token>sandeep</token></entry> > >> > >> <entry><token>manoj</token></entry> > >> > >> > > And use this dictionary in the DictionaryNameFinder. > > > > I would like to know the advantages of using this format. Is there any > > other formats available? > > > > Could you please explain more. > > > > Thanks. > > Manoj > > > > On Fri, Jul 21, 2017 at 3:56 PM, Jim O'Regan <jaore...@tcd.ie> wrote: > > > >> 2017-07-19 10:48 GMT+01:00 Manoj B. Narayanan < > >> manojb.narayanan2...@gmail.com>: > >> > >>> Hi all, > >>> > >>> I wanted to find out if there is any specific reason behind using XML > >>> format for dictionaries for Name Finder. > >>> > >> > >> It's not XML. There is a very superficial similarity in the use of <>, > but, > >> at a minimum > >> <START:person> Pierre Vinken <END> > >> would need to be something like > >> <name type="person"> Pierre Vinken </name> > >> and the whole document would need to be enclosed by a pair of tags. > >> > >> > >>> Also, is there any source from where we can get the documentation > >> regarding > >>> the dictionary formats for various tools (tokenizer, pos, name finder). > >>> > >> > >> The manual: https://opennlp.apache.org/docs/1.8.1/manual/opennlp.html > >> More specifically, > >> tokeniser: > >> https://opennlp.apache.org/docs/1.8.1/manual/opennlp. > >> html#tools.tokenizer.training > >> pos: > >> https://opennlp.apache.org/docs/1.8.1/manual/opennlp. > >> html#tools.postagger.training > >> name finder: > >> https://opennlp.apache.org/docs/1.8.1/manual/opennlp. > >> html#tools.namefind.training > >> > >