On 15 April 2013 16:49, Andrea Di Menna <[email protected]> wrote:
> Hi Pablo,
>
> there is already a DBpedia Core module actually:
>
> https://github.com/dbpedia/extraction-framework/tree/master/core
>
> Is that correct JC?

Sure, but it's rather large. I guess what Pablo means is a module that
only contains a few classes that are the 'real core', and I totally
agree that t would be nice to have something like that, or even
several smaller modules where each one covers only a certain aspect
that some kind of Wikipedia extraction may need. The current 'core'
module contains many classes that are actually rather specific and not
'basic' at all.

Cheers,
JC


>
> Cheers,
> Andrea
>
>
> 2013/4/15 Pablo N. Mendes <[email protected]>
>
>>
>> Hi Jona,
>> thanks! IIRC, we decided for not adding the DEF (DBpedia Extraction
>> Framework) as a dependency to pignlproc in order to reduce the size of the
>> jar that has to be shipped to each hadoop node. So I think somebody just
>> snagged the code into our codebase.
>>
>> It would be very neat if these reusable classes would be somehow separated
>> into a "DBpedia Core" module that could be imported by any project that
>> depends on DBpedia. We also use the similar Disambiguation class, and the
>> WikiUtil for encoding/decoding URIs.
>>
>> Cheers,
>> Pablo
>>
>>
>> On Mon, Apr 15, 2013 at 4:40 PM, Jona Christopher Sahnwaldt
>> <[email protected]> wrote:
>>>
>>> On 15 April 2013 16:31, Wang Wei <[email protected]> wrote:
>>> > Hi Pablo,
>>> > I have updated the Internationalization-(DB-backed-core) page. There
>>> > are
>>> > some inconsistencies between the webpage and the index_db.sh script,
>>> > e.g.,
>>> > the paths. I thinks there are also some problems for the index_db.sh.
>>> > I'll
>>> > check it after finishing downloading the wikipedia dump. I have already
>>> > set
>>> > up the clusters and environment.
>>> >
>>> > https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Internationalization-(DB-backed-core)
>>> >
>>> > I know  the Chinese language. But the current getRedirectPatterns() in
>>> >
>>> > https://github.com/dbpedia-spotlight/pignlproc/blob/master/src/main/java/pignlproc/markup/AnnotatingMarkupParser.java
>>> > does not support Chinese. Anyway, I will try to added it.
>>>
>>> Hi everyone @Spotlight,
>>>
>>> if you want, you could use
>>>
>>> https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/wikiparser/impl/wikipedia/Redirect.scala
>>> for a list of redirect tags. That class is generated
>>> semi-automatically from downloaded Wikipedia settings. I have no idea
>>> how much effort it would be to integrate that class (or the generating
>>> process) into DBpedia Spotlight and if it would be worth the effort.
>>>
>>> Cheers,
>>> JC
>>>
>>> >
>>> > I also moved the user's manual page from wiki to github:
>>> >
>>> > https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/User's-manual.
>>> > But there are much overlap between this page with the web service
>>> >
>>> > page(https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Web-service).
>>> > In fact, this page seems to be a little simple as a user's manual page.
>>> > Some
>>> > details should be added, e.g., the programmatic usage part.
>>> >
>>> > I will report my progress later.
>>> > Thanks for your guidance! People from Open source community are really
>>> > nice.. It would be my pleasure to contribute in this community.
>>> >
>>> > Best Regards,
>>> > Wei Wang
>>> >
>>> >
>>> > On Mon, Apr 15, 2013 at 6:28 PM, Pablo N. Mendes
>>> > <[email protected]>
>>> > wrote:
>>> >>
>>> >> Hi Wei,
>>> >> Thanks for your interest. Can you share with us (e.g. via links to
>>> >> github)
>>> >> the results from your warm up tasks so far?
>>> >>
>>> >> I have another WarmUp task proposal. If you know anything about
>>> >> Chinese (I
>>> >> have Chinese friends with family name Wang, so sorry if I make
>>> >> incorrect
>>> >> assumptions), you could try to run the Indexing (DB core) for Chinese
>>> >> language, or identify the reasons why this process would not work.
>>> >>
>>> >>
>>> >> https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Internationalization-(DB-backed-core)
>>> >>
>>> >> You could send us questions that you may have, so that we can improve
>>> >> the
>>> >> documentation on the page above.
>>> >>
>>> >> Please note that this does not guarantee your acceptance to GSoC.
>>> >> However,
>>> >> doing well in open source community participation is usually a great
>>> >> plus
>>> >> during the selection phase. So the advice to start participating early
>>> >> goes
>>> >> for all prospective applicants.
>>> >>
>>> >> Cheers,
>>> >> Pablo
>>> >>
>>> >>
>>> >> On Sun, Apr 14, 2013 at 4:00 PM, Wang Wei <[email protected]>
>>> >> wrote:
>>> >>>
>>> >>> Hi all,
>>> >>>
>>> >>> I am Wei Wang. After trying some warm up tasks, I think It is time to
>>> >>> make some noise here.
>>> >>>
>>> >>> A few months ago, I worked on a project, in which I annotated 6
>>> >>> million
>>> >>> Tweets and Facebook posts with the Wikipedia concept(each Wikipedia
>>> >>> article
>>> >>> is regarded as a concept) . Specifically, with the help of
>>> >>> Wikipedia-Miner(http://wikipedia-miner.cms.waikato.ac.nz/), I
>>> >>> processed the
>>> >>> English dump of Wikipedia of October 2012 on a Hadoop cluster to
>>> >>> extract
>>> >>> some meta data. Then I built a concept dictionary(with 9 million
>>> >>> entities)
>>> >>> which maps phrases(concept mentions) to target concept(Wikipedia
>>> >>> article).
>>> >>> For each tweet and post, the concept mentions were recognized by
>>> >>> looking up
>>> >>> the dictionary. Then disambiguation is conducted by context analysis.
>>> >>>
>>> >>> In fact, what I did is just one of the functions provided by DBpedia
>>> >>> Spotlight. But, through this project, I realized the importance and
>>> >>> challenges of DBpedia Spotlight. For example, by annotating text with
>>> >>> concepts, computers are able to understand the semantics of the text.
>>> >>> However, there are two challenges for this annotation work. Firstly,
>>> >>> how to
>>> >>> recognize the concept mentions? Phrases are  often false positively
>>> >>> recognized as concept mentions. E.g., given the sentence "It's late,
>>> >>> I have
>>> >>> to go now", since there is an article about a song named "It's late"
>>> >>> in
>>> >>> Wikipedia, it is likely the phrase in this sentence would be linked
>>> >>> to that
>>> >>> article. It is also possible that some true concept mentions are not
>>> >>> recognized due to the dictionary coverage and text noise. Secondly,
>>> >>> it is
>>> >>> well known that some mentions are ambiguous. Thus, how to
>>> >>> disambiguate them
>>> >>> accurately and efficiently is another challenge, especially for short
>>> >>> text.
>>> >>> This is still an on-going research topics.
>>> >>>
>>> >>> Regarding the idea 3.1(Google mention corpus), I think the overlap of
>>> >>> google mention corpus and wikipedia dump may be a point that should
>>> >>> be
>>> >>> considered. Otherwise we may index some redundant data. (pls correct
>>> >>> me
>>> >>> since I have little knowledge about this part. And I am reading the
>>> >>> related
>>> >>> code)
>>> >>>
>>> >>> For 3.2, I think it's really important. From my experience, the
>>> >>> disambiguation procedure is  time consuming , because content
>>> >>> analysis is
>>> >>> usually involved.
>>> >>>
>>> >>> So far, I have done some warm up tasks like documentation. I am
>>> >>> trying
>>> >>> the software and learning Scala. I will share my thoughts regarding
>>> >>> the two
>>> >>> ideas later. Thanks.
>>> >>>
>>> >>> Best Regards,
>>> >>> Wei Wang
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> ------------------------------------------------------------------------------
>>> >>> Precog is a next-generation analytics platform capable of advanced
>>> >>> analytics on semi-structured data. The platform includes APIs for
>>> >>> building
>>> >>> apps and a phenomenal toolset for data science. Developers can use
>>> >>> our toolset for easy data analysis & visualization. Get a free
>>> >>> account!
>>> >>> http://www2.precog.com/precogplatform/slashdotnewsletter
>>> >>> _______________________________________________
>>> >>> Dbpedia-gsoc mailing list
>>> >>> [email protected]
>>> >>> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >>
>>> >> Pablo N. Mendes
>>> >> http://pablomendes.com
>>> >
>>> >
>>> >
>>> >
>>> > ------------------------------------------------------------------------------
>>> > Precog is a next-generation analytics platform capable of advanced
>>> > analytics on semi-structured data. The platform includes APIs for
>>> > building
>>> > apps and a phenomenal toolset for data science. Developers can use
>>> > our toolset for easy data analysis & visualization. Get a free account!
>>> > http://www2.precog.com/precogplatform/slashdotnewsletter
>>> > _______________________________________________
>>> > Dbpedia-gsoc mailing list
>>> > [email protected]
>>> > https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>>> >
>>
>>
>>
>>
>> --
>>
>> Pablo N. Mendes
>> http://pablomendes.com
>>
>>
>> ------------------------------------------------------------------------------
>> Precog is a next-generation analytics platform capable of advanced
>> analytics on semi-structured data. The platform includes APIs for building
>> apps and a phenomenal toolset for data science. Developers can use
>> our toolset for easy data analysis & visualization. Get a free account!
>> http://www2.precog.com/precogplatform/slashdotnewsletter
>> _______________________________________________
>> Dbpedia-gsoc mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>>
>

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Reply via email to