Re: [Dbpedia-gsoc] Spotlight Task: Extract the necessary DBpedia data directly from the Wikipedia dump

Jona Christopher Sahnwaldt Tue, 16 Apr 2013 07:19:24 -0700

On Apr 16, 2013 3:45 PM, "Dimitris Kontokostas" <[email protected]> wrote:
>
> Hi Jo,
>
> This is a good interdisciplinary task ;)
>
> About the extraction script, DBpedia now uses a predefined folder
structure for locating dumps / extracting data and follows the wIkipedia
dumps structure [1].
>
> There are two options here
> 1) Spotlight adapts the configuration to accommodate that
> 2) DBpedia makes the dump easier to run with arbitrary mediawiki dumps
and output folders.
>
> Maybe (1) is a lot easier but I'd vote for (2). ;)
> For (2) what we need is to create 2 new scripts for download / extract
that will be based on [2] & [3].
> Once we have a volunteer we can discuss this in detail


If the desired new folder/file name structure is reasonably similar, we
don't really need to create new scripts, we basically need to turn
https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/util/Finder.scalainto
an interface and provide different implementations: one is the current
finder, the other would be a new one. Finder.scala already is a Strategy
pattern, now we just have to make it configurable.

>
> Cheers,
> Dimitris
>
>
> [1] http://dumps.wikimedia.org/
> [2]
https://github.com/dbpedia/extraction-framework/blob/master/dump/src/main/scala/org/dbpedia/extraction/dump/extract/Extraction.scala
> [3]
https://github.com/dbpedia/extraction-framework/blob/master/dump/src/main/scala/org/dbpedia/extraction/dump/download/Download.scala
>
>
> On Tue, Apr 16, 2013 at 1:29 PM, Joachim Daiber <[email protected]>
wrote:
>>
>> Hey all,
>>
>> I added this task to the Spotlight ideas, it's smallish, so it's maybe
more of a warm-up task:
>>
>> ----
>>
>> For creating Spotlight models, we need instance_types.nt, redirects.nt
and disambiguations.nt. Since we want these to be from the same Wikipedia
dump as the one from which we create the model, integrate the DBpedia
extraction into the index_db.sh script in DBpedia Spotlight, so that the
files are automatically produced during indexing.
>>
>> ----
>>
>> Maybe somebody who knows DEF better than I could comment on how
complicated this would be to do. We have the Wikipedia dump and we need
redirects, disambiguation pages and instance types for this version of the
dump.
>>
>> Best,
>> Jo
>>
>>
------------------------------------------------------------------------------
>> Precog is a next-generation analytics platform capable of advanced
>> analytics on semi-structured data. The platform includes APIs for
building
>> apps and a phenomenal toolset for data science. Developers can use
>> our toolset for easy data analysis & visualization. Get a free account!
>> http://www2.precog.com/precogplatform/slashdotnewsletter
>> _______________________________________________
>> Dbpedia-gsoc mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>>
>
>
>
> --
> Kontokostas Dimitris
>
>
------------------------------------------------------------------------------
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> _______________________________________________
> Dbpedia-gsoc mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter

_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Re: [Dbpedia-gsoc] Spotlight Task: Extract the necessary DBpedia data directly from the Wikipedia dump

Reply via email to