Re: [Dbpedia-gsoc] Experience and thoughts for DBpedia Spotlight Ideas

Joachim Daiber Mon, 15 Apr 2013 07:47:00 -0700

Hey Jona,

I already copied over the content of an older version of that file because
I didn't want to add the DEF dependency to pignlproc. But for the sake of
not repeating ourselves, it would be good if we eventually used
Redirect.scala from DBpedia directly.


Best,
Jo



On Mon, Apr 15, 2013 at 4:40 PM, Jona Christopher Sahnwaldt <[email protected]
> wrote:

> On 15 April 2013 16:31, Wang Wei <[email protected]> wrote:
> > Hi Pablo,
> > I have updated the Internationalization-(DB-backed-core) page. There are
> > some inconsistencies between the webpage and the index_db.sh script,
> e.g.,
> > the paths. I thinks there are also some problems for the index_db.sh.
> I'll
> > check it after finishing downloading the wikipedia dump. I have already
> set
> > up the clusters and environment.
> >
> https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Internationalization-(DB-backed-core)
> >
> > I know  the Chinese language. But the current getRedirectPatterns() in
> >
> https://github.com/dbpedia-spotlight/pignlproc/blob/master/src/main/java/pignlproc/markup/AnnotatingMarkupParser.java
> > does not support Chinese. Anyway, I will try to added it.
>
> Hi everyone @Spotlight,
>
> if you want, you could use
>
> https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/wikiparser/impl/wikipedia/Redirect.scala
> for a list of redirect tags. That class is generated
> semi-automatically from downloaded Wikipedia settings. I have no idea
> how much effort it would be to integrate that class (or the generating
> process) into DBpedia Spotlight and if it would be worth the effort.
>
> Cheers,
> JC
>
> >
> > I also moved the user's manual page from wiki to github:
> >
> https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/User's-manual.
> > But there are much overlap between this page with the web service
> > page(
> https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Web-service).
> > In fact, this page seems to be a little simple as a user's manual page.
> Some
> > details should be added, e.g., the programmatic usage part.
> >
> > I will report my progress later.
> > Thanks for your guidance! People from Open source community are really
> > nice.. It would be my pleasure to contribute in this community.
> >
> > Best Regards,
> > Wei Wang
> >
> >
> > On Mon, Apr 15, 2013 at 6:28 PM, Pablo N. Mendes <[email protected]>
> > wrote:
> >>
> >> Hi Wei,
> >> Thanks for your interest. Can you share with us (e.g. via links to
> github)
> >> the results from your warm up tasks so far?
> >>
> >> I have another WarmUp task proposal. If you know anything about Chinese
> (I
> >> have Chinese friends with family name Wang, so sorry if I make incorrect
> >> assumptions), you could try to run the Indexing (DB core) for Chinese
> >> language, or identify the reasons why this process would not work.
> >>
> >>
> https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Internationalization-(DB-backed-core)
> >>
> >> You could send us questions that you may have, so that we can improve
> the
> >> documentation on the page above.
> >>
> >> Please note that this does not guarantee your acceptance to GSoC.
> However,
> >> doing well in open source community participation is usually a great
> plus
> >> during the selection phase. So the advice to start participating early
> goes
> >> for all prospective applicants.
> >>
> >> Cheers,
> >> Pablo
> >>
> >>
> >> On Sun, Apr 14, 2013 at 4:00 PM, Wang Wei <[email protected]>
> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I am Wei Wang. After trying some warm up tasks, I think It is time to
> >>> make some noise here.
> >>>
> >>> A few months ago, I worked on a project, in which I annotated 6 million
> >>> Tweets and Facebook posts with the Wikipedia concept(each Wikipedia
> article
> >>> is regarded as a concept) . Specifically, with the help of
> >>> Wikipedia-Miner(http://wikipedia-miner.cms.waikato.ac.nz/), I
> processed the
> >>> English dump of Wikipedia of October 2012 on a Hadoop cluster to
> extract
> >>> some meta data. Then I built a concept dictionary(with 9 million
> entities)
> >>> which maps phrases(concept mentions) to target concept(Wikipedia
> article).
> >>> For each tweet and post, the concept mentions were recognized by
> looking up
> >>> the dictionary. Then disambiguation is conducted by context analysis.
> >>>
> >>> In fact, what I did is just one of the functions provided by DBpedia
> >>> Spotlight. But, through this project, I realized the importance and
> >>> challenges of DBpedia Spotlight. For example, by annotating text with
> >>> concepts, computers are able to understand the semantics of the text.
> >>> However, there are two challenges for this annotation work. Firstly,
> how to
> >>> recognize the concept mentions? Phrases are  often false positively
> >>> recognized as concept mentions. E.g., given the sentence "It's late, I
> have
> >>> to go now", since there is an article about a song named "It's late" in
> >>> Wikipedia, it is likely the phrase in this sentence would be linked to
> that
> >>> article. It is also possible that some true concept mentions are not
> >>> recognized due to the dictionary coverage and text noise. Secondly, it
> is
> >>> well known that some mentions are ambiguous. Thus, how to disambiguate
> them
> >>> accurately and efficiently is another challenge, especially for short
> text.
> >>> This is still an on-going research topics.
> >>>
> >>> Regarding the idea 3.1(Google mention corpus), I think the overlap of
> >>> google mention corpus and wikipedia dump may be a point that should be
> >>> considered. Otherwise we may index some redundant data. (pls correct me
> >>> since I have little knowledge about this part. And I am reading the
> related
> >>> code)
> >>>
> >>> For 3.2, I think it's really important. From my experience, the
> >>> disambiguation procedure is  time consuming , because content analysis
> is
> >>> usually involved.
> >>>
> >>> So far, I have done some warm up tasks like documentation. I am trying
> >>> the software and learning Scala. I will share my thoughts regarding
> the two
> >>> ideas later. Thanks.
> >>>
> >>> Best Regards,
> >>> Wei Wang
> >>>
> >>>
> >>>
> >>>
> ------------------------------------------------------------------------------
> >>> Precog is a next-generation analytics platform capable of advanced
> >>> analytics on semi-structured data. The platform includes APIs for
> >>> building
> >>> apps and a phenomenal toolset for data science. Developers can use
> >>> our toolset for easy data analysis & visualization. Get a free account!
> >>> http://www2.precog.com/precogplatform/slashdotnewsletter
> >>> _______________________________________________
> >>> Dbpedia-gsoc mailing list
> >>> [email protected]
> >>> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> Pablo N. Mendes
> >> http://pablomendes.com
> >
> >
> >
> >
> ------------------------------------------------------------------------------
> > Precog is a next-generation analytics platform capable of advanced
> > analytics on semi-structured data. The platform includes APIs for
> building
> > apps and a phenomenal toolset for data science. Developers can use
> > our toolset for easy data analysis & visualization. Get a free account!
> > http://www2.precog.com/precogplatform/slashdotnewsletter
> > _______________________________________________
> > Dbpedia-gsoc mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
> >
>
>
> ------------------------------------------------------------------------------
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> _______________________________________________
> Dbpedia-gsoc mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter

_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Re: [Dbpedia-gsoc] Experience and thoughts for DBpedia Spotlight Ideas

Reply via email to