Hi,
You can grep the output with
http://en.wikipedia.org<http://en.wikipedia.org/wiki/Anarchism>and
pipe it to sort -u
Cheers,
Dimitris
On Sat, Mar 19, 2011 at 3:47 PM, Gabriele Kahlout
<[email protected]>wrote:
>
>
> On Sat, Mar 19, 2011 at 2:13 PM, Gabriele Kahlout <
> [email protected]> wrote:
>
>> Hello,
>>
>> I've downloaded and wrote a simple parser to give me pedia urls from this
>> dbpedia file
>> <http://downloads.dbpedia.org/3.6/en/wikipedia_links_en.nt.bz2>as shown
>> below. I find the result unsatisfactory since it contains many duplicates.
>> Adding logic to the parser to avoid them (through remembering) seems to be
>> also very expensive, since the file size (uncompressed) is 3GB. Is there a
>> better approach to get Wikipedia urls like is done with dmoz in
>>
>> wget http://rdf.dmoz.org/rdf/content.rdf.u8.gz
>> bin/nutch org.apache.nutch.tools.DmozParser content.rdf.u8 -subset 5000 >
>> dmoz/urls
>>
>>
>>
>> http://en.wikipedia.org/wiki/AfghanistanGeography
>> http://dbpedia.org/resource/AfghanistanGeography
>> http://en.wikipedia.org/wiki/AfghanistanGeography
>> n"@e
>> http://dbpedia.org/resource/AfghanistanGeography
>> http://en.wikipedia.org/wiki/AfghanistanGeography
>> http://en.wikipedia.org/wiki/Anarchism
>> http://dbpedia.org/resource/Anarchism
>> http://en.wikipedia.org/wiki/Anarchism
>> n"@e
>> http://dbpedia.org/resource/Anarchism
>> http://en.wikipedia.org/wiki/Anarchism
>> http://en.wikipedia.org/wiki/AccessibleComputing
>> http://dbpedia.org/resource/AccessibleComputing
>> http://en.wikipedia.org/wiki/AccessibleComputing
>> n"@e
>> http://dbpedia.org/resource/AccessibleComputing
>> http://en.wikipedia.org/wiki/AccessibleComputing
>> http://en.wikipedia.org/wiki/AfghanistanHistory
>> http://dbpedia.org/resource/AfghanistanHistory
>> http://en.wikipedia.org/wiki/AfghanistanHistory
>> n"@e
>> http://dbpedia.org/resource/AfghanistanHistory
>> http://en.wikipedia.org/wiki/AfghanistanHistory
>> http://en.wikipedia.org/wiki/AfghanistanPeople
>> http://dbpedia.org/resource/AfghanistanPeople
>> http://en.wikipedia.org/wiki/AfghanistanPeople
>> n"@e
>> http://dbpedia.org/resource/AfghanistanPeople
>> http://en.wikipedia.org/wiki/AfghanistanPeople
>> http://en.wikipedia.org/wiki/AfghanistanTransportations
>> http://dbpedia.org/resource/AfghanistanTransportations
>> http://en.wikipedia.org/wiki/AfghanistanTransportations
>> n"@e
>> http://dbpedia.org/resource/AfghanistanTransportations
>> http://en.wikipedia.org/wiki/AfghanistanTransportations
>> http://en.wikipedia.org/wiki/AfghanistanCommunications
>> http://dbpedia.org/resource/AfghanistanCommunications
>> http://en.wikipedia.org/wiki/AfghanistanCommunications
>>
>>
>> --
>> Regards,
>> K. Gabriele
>>
>> --- unchanged since 20/9/10 ---
>> P.S. If the subject contains "[LON]" or the addressee acknowledges the
>> receipt within 48 hours then I don't resend the email.
>> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
>> time(x) < Now + 48h) ⇒ ¬resend(I, this).
>>
>> If an email is sent by a sender that is not a trusted contact or the email
>> does not contain a valid code then the email is not received. A valid code
>> starts with a hyphen and ends with "X".
>> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
>> L(-[a-z]+[0-9]X)).
>>
>>
>
>
>
> ------------------------------------------------------------------------------
> Colocation vs. Managed Hosting
> A question and answer guide to determining the best fit
> for your organization - today and in the future.
> http://p.sf.net/sfu/internap-sfd2d
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
--
Kontokostas Dimitris
------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion