Hey Sebastian,

there is a bit more info about the DB pipeline in this pending paper [1].
Basically, at the moment, it supports two types of tokenizers: supervised
OpenNLP models and tokenizers based on the java.text package ([4] these are
rather like baseline tokenizers; they include Japanese, Korean, and
Chinese, but we haven't tested their quality yet). We would be very happy
for contributions supporting more tokenizers, especially for Asian
languages. Tokenizers must implement one interface for run-time [2] and one
for indexing [3].

KAIST institute from Korea has developed their own POS tagger[1] with a
> NIF serializer, so we might need to integrate this in some form.


In this pipeline, part-of-speech tagging is only used as part of Named
Entity Recognition and noun phrase chunking, so either you need to do
tokenization & POS tagging & (NP chunking || NER) or only tokenization.
There are more details about this in [1].

Hope that helps,
Jo

[1] http://jodaiber.de/doc/entity.pdf
[2]
https://github.com/dbpedia-spotlight/dbpedia-spotlight/blob/master/core/src/main/scala/org/dbpedia/spotlight/db/model/TextTokenizer.scala
[3]
https://github.com/dbpedia-spotlight/dbpedia-spotlight/blob/master/core/src/main/scala/org/dbpedia/spotlight/db/model/StringTokenizer.scala
[4]
http://www.oracle.com/technetwork/java/javase/locales-137662.html#util-text


On Sun, Apr 21, 2013 at 12:00 PM, Sebastian Hellmann <
[email protected]> wrote:

> Hi Max,
> yes I guess this was confusing. Actually, I wanted to post this link:
>
> https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Internationalization
> The text there is quite confusing.
>
> Actually, I wanted to be pointed to a best practice. Maybe we can do it
> language-wise. For English, German, Dutch and Greece pipelines seem to
> exist, but I didn't quite figure out, which method they use.
>
> I will dig into this during the next week however. Maybe, it might make
> sense to create a NIF-Reader for POS - taggers.
> KAIST institute from Korea has developed their own POS tagger[1] with a
> NIF serializer, so we might need to integrate this in some form.
>
> All the best,
> Sebastian
>
>
>
> [1] http://sourceforge.net/projects/hannanum/
>
> Am 19.04.2013 18:19, schrieb Max Jakob:
> > On Fri, Apr 19, 2013 at 2:08 PM, Sebastian Hellmann
> > <[email protected]> wrote:
> >> Can somebody tell me what the best way is to create a Korean Spotlight.
> >> http://wiki.dbpedia.org/Internationalization
> >> Seems to be outdated. It also confused us a lot.
> > For clarification, the link you provided points to the i18n page for
> > DBpedia (the knowledge base).
> > For DBpedia Spotlight (the annotation tool), there are two different
> > backends and each has its own creation pipeline:
> >
> > 1. Lucene backed core pipeline (older):
> >
> https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Internationalization-(Lucene-backed-core)
> >
> > 2. DB backed core pipeline (newer; you probably want this one):
> >
> https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Internationalization-(DB-backed-core)
> >     Small project to make this pipeline even easier to use:
> >     https://github.com/jodaiber/model-quickstarter
> >
> > Hope this helps to clear things up.
> > Cheers,
> > Max
> >
>
>
> --
> Dipl. Inf. Sebastian Hellmann
> Department of Computer Science, University of Leipzig
> Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
> http://dbpedia.org/Wiktionary , http://dbpedia.org
> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
> Research Group: http://aksw.org
>
>
> ------------------------------------------------------------------------------
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> _______________________________________________
> Dbp-spotlight-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
>
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Reply via email to