Amit,
Both sound great! We'd love to have them contributed to the project.
Cheers,
Pablo
On Mon, Mar 19, 2012 at 11:45 AM, Amit Kumar <[email protected]> wrote:
> Hi Pablo,
> For the continuous extraction we are trying to setup a pipeline, which
> polls and downloads the Wikipedia data, passes it through DEF(Dbpedia
> Extraction Framework) and then create knowledgebases. Many of the plumbing
> is handled by Yahoo! Internal tools and platform but there are some pieces
> which might be useful for the Dbpedia community. I’m mentioning some below.
> Let me know if you think you can use anyone. if yes, I would contact our
> Open Source Working Group Manager to take it forward.
>
>
> 1. Wiki Downloader : We have two components.
> - Full Downloader: A basic bash script which poll the latest folder of
> wikipedia dumps. Check if a new dumps is available and downloads it to a
> dated folder.
> - Incremental Downloader: It includes an IRC bot which keeps
> listening to wikipedia IRC channel. It makes a list of files which were
> updated. It De-dups and downloads those pages every few hours while
> respecting the wikipedia QPS.
> 2. Def Wrapper: A bash script which invokes the DEF on the data
> generated by the downloader.
>
>
> Both these have some basic notifications and error handling. There are
> some stuff after DEF, but they are quite internal to Yahoo!.
>
> I think you already have a download.scala which downloads the dbpedia
> dumps. There were few mails in the last week about the same. If you are
> facing some particular issue in particular with DBpedia Portuguese, do let
> me know. If we have faced the same, we would let you know.
>
> Regards
> Amit
>
>
>
> On 3/19/12 3:45 PM, "Pablo Mendes" <[email protected]> wrote:
>
> Hi Amit,
>
> >"We have been trying to setup an instance of dbpedia to continously
> extract data from wikipedia dumps/updates. While"
>
> We would like to do the same for the DBpedia Portuguese. If you can share
> any code, it would be much appreciated.
>
> Cheers
> Pablo
>
> On Mar 19, 2012 10:38 AM, "Amit Kumar" <[email protected]> wrote:
>
> Hi,
> We have been trying to setup an instance of dbpedia to continously extract
> data from wikipedia dumps/updates. While going through the output we
> observed that the image extractor was only picking up the first image for
> any page.
>
> I can see commented out code present in the ImageExtractor which seems to
> pick all images. In place of that we have the code which returns on the
> first image it encounters. My questions are :
>
>
> 1. Does the commented out code actually works ? Does it really pick
> all the images on a particular page?
> 2. Why was the change made in the code ?
>
>
>
> Thanks and Regards
> Amit
>
>
>
> ------------------------------------------------------------------------------
> This SF email is sponsosred by:
> Try Windows Azure free for 90 days Click Here
> http://p.sf.net/sfu/sfd2d-msazure
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
>
------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion