On Sat, Mar 19, 2011 at 2:13 PM, Gabriele Kahlout
<[email protected]>wrote:

> Hello,
>
> I've downloaded and wrote a simple parser to give me pedia urls from this
> dbpedia file
> <http://downloads.dbpedia.org/3.6/en/wikipedia_links_en.nt.bz2>as shown
> below. I find the result unsatisfactory since it contains many duplicates.
> Adding logic to the parser to avoid them (through remembering) seems to be
> also very expensive, since the file size (uncompressed) is 3GB. Is there a
> better approach to get Wikipedia urls like is done with dmoz in
>
> wget http://rdf.dmoz.org/rdf/content.rdf.u8.gz
> bin/nutch org.apache.nutch.tools.DmozParser content.rdf.u8 -subset 5000 > 
> dmoz/urls
>
>
>
> http://en.wikipedia.org/wiki/AfghanistanGeography
> http://dbpedia.org/resource/AfghanistanGeography
> http://en.wikipedia.org/wiki/AfghanistanGeography
> n"@e
> http://dbpedia.org/resource/AfghanistanGeography
> http://en.wikipedia.org/wiki/AfghanistanGeography
> http://en.wikipedia.org/wiki/Anarchism
> http://dbpedia.org/resource/Anarchism
> http://en.wikipedia.org/wiki/Anarchism
> n"@e
> http://dbpedia.org/resource/Anarchism
> http://en.wikipedia.org/wiki/Anarchism
> http://en.wikipedia.org/wiki/AccessibleComputing
> http://dbpedia.org/resource/AccessibleComputing
> http://en.wikipedia.org/wiki/AccessibleComputing
> n"@e
> http://dbpedia.org/resource/AccessibleComputing
> http://en.wikipedia.org/wiki/AccessibleComputing
> http://en.wikipedia.org/wiki/AfghanistanHistory
> http://dbpedia.org/resource/AfghanistanHistory
> http://en.wikipedia.org/wiki/AfghanistanHistory
> n"@e
> http://dbpedia.org/resource/AfghanistanHistory
> http://en.wikipedia.org/wiki/AfghanistanHistory
> http://en.wikipedia.org/wiki/AfghanistanPeople
> http://dbpedia.org/resource/AfghanistanPeople
> http://en.wikipedia.org/wiki/AfghanistanPeople
> n"@e
> http://dbpedia.org/resource/AfghanistanPeople
> http://en.wikipedia.org/wiki/AfghanistanPeople
> http://en.wikipedia.org/wiki/AfghanistanTransportations
> http://dbpedia.org/resource/AfghanistanTransportations
> http://en.wikipedia.org/wiki/AfghanistanTransportations
> n"@e
> http://dbpedia.org/resource/AfghanistanTransportations
> http://en.wikipedia.org/wiki/AfghanistanTransportations
> http://en.wikipedia.org/wiki/AfghanistanCommunications
> http://dbpedia.org/resource/AfghanistanCommunications
> http://en.wikipedia.org/wiki/AfghanistanCommunications
>
>
> --
> Regards,
> K. Gabriele
>
> --- unchanged since 20/9/10 ---
> P.S. If the subject contains "[LON]" or the addressee acknowledges the
> receipt within 48 hours then I don't resend the email.
> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> time(x) < Now + 48h) ⇒ ¬resend(I, this).
>
> If an email is sent by a sender that is not a trusted contact or the email
> does not contain a valid code then the email is not received. A valid code
> starts with a hyphen and ends with "X".
> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> L(-[a-z]+[0-9]X)).
>
>
------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to