That would be super cool!
Cheers
Andrea
2013/4/16 Pablo N. Mendes <[email protected]>
>
> Or we run DEF extraction on Hadoop. :)
>
> Another task idea?
>
> Cheers,
> Pablo
>
>
>
> On Tue, Apr 16, 2013 at 4:34 PM, Joachim Daiber
> <[email protected]>wrote:
>
>> Hey,
>>
>> so far, we download the Wikipedia dumps straight into HDFS. For the
>> DBpedia extraction, we would store the dumps locally first, so we can use
>> any directory structure that makes it easier.
>>
>> Best,
>> Jo
>>
>>
>> On Tue, Apr 16, 2013 at 4:19 PM, Jona Christopher Sahnwaldt <
>> [email protected]> wrote:
>>
>>>
>>> On Apr 16, 2013 3:45 PM, "Dimitris Kontokostas" <[email protected]>
>>> wrote:
>>> >
>>> > Hi Jo,
>>> >
>>> > This is a good interdisciplinary task ;)
>>> >
>>> > About the extraction script, DBpedia now uses a predefined folder
>>> structure for locating dumps / extracting data and follows the wIkipedia
>>> dumps structure [1].
>>> >
>>> > There are two options here
>>> > 1) Spotlight adapts the configuration to accommodate that
>>> > 2) DBpedia makes the dump easier to run with arbitrary mediawiki dumps
>>> and output folders.
>>> >
>>> > Maybe (1) is a lot easier but I'd vote for (2). ;)
>>> > For (2) what we need is to create 2 new scripts for download / extract
>>> that will be based on [2] & [3].
>>> > Once we have a volunteer we can discuss this in detail
>>>
>>> If the desired new folder/file name structure is reasonably similar, we
>>> don't really need to create new scripts, we basically need to turn
>>> https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/util/Finder.scalainto
>>> an interface and provide different implementations: one is the current
>>> finder, the other would be a new one. Finder.scala already is a Strategy
>>> pattern, now we just have to make it configurable.
>>>
>>> >
>>> > Cheers,
>>> > Dimitris
>>> >
>>> >
>>> > [1] http://dumps.wikimedia.org/
>>> > [2]
>>> https://github.com/dbpedia/extraction-framework/blob/master/dump/src/main/scala/org/dbpedia/extraction/dump/extract/Extraction.scala
>>> > [3]
>>> https://github.com/dbpedia/extraction-framework/blob/master/dump/src/main/scala/org/dbpedia/extraction/dump/download/Download.scala
>>> >
>>> >
>>> > On Tue, Apr 16, 2013 at 1:29 PM, Joachim Daiber <
>>> [email protected]> wrote:
>>> >>
>>> >> Hey all,
>>> >>
>>> >> I added this task to the Spotlight ideas, it's smallish, so it's
>>> maybe more of a warm-up task:
>>> >>
>>> >> ----
>>> >>
>>> >> For creating Spotlight models, we need instance_types.nt,
>>> redirects.nt and disambiguations.nt. Since we want these to be from the
>>> same Wikipedia dump as the one from which we create the model, integrate
>>> the DBpedia extraction into the index_db.sh script in DBpedia Spotlight, so
>>> that the files are automatically produced during indexing.
>>> >>
>>> >> ----
>>> >>
>>> >> Maybe somebody who knows DEF better than I could comment on how
>>> complicated this would be to do. We have the Wikipedia dump and we need
>>> redirects, disambiguation pages and instance types for this version of the
>>> dump.
>>> >>
>>> >> Best,
>>> >> Jo
>>> >>
>>> >>
>>> ------------------------------------------------------------------------------
>>> >> Precog is a next-generation analytics platform capable of advanced
>>> >> analytics on semi-structured data. The platform includes APIs for
>>> building
>>> >> apps and a phenomenal toolset for data science. Developers can use
>>> >> our toolset for easy data analysis & visualization. Get a free
>>> account!
>>> >> http://www2.precog.com/precogplatform/slashdotnewsletter
>>> >> _______________________________________________
>>> >> Dbpedia-gsoc mailing list
>>> >> [email protected]
>>> >> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Kontokostas Dimitris
>>> >
>>> >
>>> ------------------------------------------------------------------------------
>>> > Precog is a next-generation analytics platform capable of advanced
>>> > analytics on semi-structured data. The platform includes APIs for
>>> building
>>> > apps and a phenomenal toolset for data science. Developers can use
>>> > our toolset for easy data analysis & visualization. Get a free account!
>>> > http://www2.precog.com/precogplatform/slashdotnewsletter
>>> > _______________________________________________
>>> > Dbpedia-gsoc mailing list
>>> > [email protected]
>>> > https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>>> >
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Precog is a next-generation analytics platform capable of advanced
>> analytics on semi-structured data. The platform includes APIs for building
>> apps and a phenomenal toolset for data science. Developers can use
>> our toolset for easy data analysis & visualization. Get a free account!
>> http://www2.precog.com/precogplatform/slashdotnewsletter
>> _______________________________________________
>> Dbpedia-gsoc mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>>
>>
>
>
> --
>
> Pablo N. Mendes
> http://pablomendes.com
>
>
> ------------------------------------------------------------------------------
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> _______________________________________________
> Dbpedia-gsoc mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>
>
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc