Just committed a script that resolves redirects for URIs in object
position in .nt and .nq files.

Cheers,
Max

On Fri, Apr 15, 2011 at 17:23, Dimitris Kontokostas <[email protected]> wrote:
> the DBpediaResourceFactory seems better but redirects can be very big (only
> the English one is ~750MB). I don't know how the framework can handle such
> big data
>
> You're right about the transitive issue, I haven't thought of it :)
> you just found a bug in a new script i am creating :)
> anyway, this can be worked out (somehow I guess)
>
> Cheers,
> Dimitris
>
> On Fri, Apr 15, 2011 at 5:57 PM, Pablo Mendes <[email protected]> wrote:
>>
>> I was thinking it could slowly evolve to a sort of DBpediaResourceFactory
>> class at the core of the workflow who knew everything about transforming
>> Wikipedia Page URLs into DBpedia Resource URIs/IRIs (including
>> language-specific knowledge, redirects, etc.)
>> But, yes, sure. Your solution sounds simple and efficient. :)
>> Keep in mind that redirects.nt may need some treatment to compute the
>> transitive closure (A redirects_to B redirects_to C -> A redirects_to C).
>> Cheers,
>> Pablo
>> On Fri, Apr 15, 2011 at 3:45 PM, Dimitris Kontokostas <[email protected]>
>> wrote:
>>>
>>> A new extractor will be too expensive
>>> i think a script can do the job just fine
>>>
>>> it will have the redirects.nt as a look-up table and replace all
>>> occurrences in the extraction dumps
>>>
>>> cheers,
>>> Dimitris
>>>
>>> On Fri, Apr 15, 2011 at 4:10 PM, Pablo Mendes <[email protected]>
>>> wrote:
>>>>>
>>>>> I like the second approach ... if we could use a unique URI to denote
>>>>> the same entity, we are better off.
>>>>
>>>> Yep. The disadvantage is that it is intrusive (requires access to
>>>> DBpedia extraction). Luckily, DBpedia is an open source project to which 
>>>> any
>>>> of us can contribute. Better yet, you can adapt similar code from DBpedia
>>>> Spotlight into a DBpedia extractor and contribute it to the project. It
>>>> should be in: org.dbpedia.spotlight.util.SurrogatesUtil.scala
>>>>
>>>> (http://dbp-spotlight.svn.sourceforge.net/viewvc/dbp-spotlight/trunk/core/src/main/scala/)
>>>> I will make sure to bug the leader of the next release to include it. :)
>>>> Cheers,
>>>> Pablo
>>>> On Fri, Apr 15, 2011 at 2:43 PM, Lushan Han <[email protected]> wrote:
>>>>>
>>>>> I like the second approach -- resolving the problem at extraction
>>>>> time. Inference with large amount of data is still difficult. If we
>>>>> could use a unique URI to denote the same entity, we are better off.
>>>>>
>>>>> Thank you all for immediate response,
>>>>> Lushan Han
>>>>>
>>>>>
>>>>> On Thu, Apr 14, 2011 at 4:37 AM, Pablo Mendes <[email protected]>
>>>>> wrote:
>>>>> > Maybe what Dimitris says is that this query would indeed be answered
>>>>> > if:
>>>>> > - redirects were treated as sameAs and inference was used (works for
>>>>> > this
>>>>> > but not all cases)
>>>>> > - the framework used redirects to do identity resolution at
>>>>> > extraction time
>>>>> >
>>>>> > Also, i should point out that you can probably sort this problem out
>>>>> > with a
>>>>> > simple Silk link spec.
>>>>> >
>>>>> > Cheers
>>>>> > Pablo
>>>>> >
>>>>> > On Apr 13, 2011 3:12 PM, "Lushan Han" <[email protected]> wrote:
>>>>> >> Hi Dimitris,
>>>>> >>
>>>>> >> I am afraid that you did not completely see my point. It is not
>>>>> >> simply
>>>>> >> a redirection problem.
>>>>> >> For example, if I want to make a SPARQL query -- what is the birth
>>>>> >> date of the architect who designed the Brooklyn Bridge?
>>>>> >>
>>>>> >> PREFIX dbo: <http://dbpedia.org/ontology/>
>>>>> >>
>>>>> >> SELECT ?person, ?date WHERE {
>>>>> >> :Brooklyn_Bridge dbo:architect ?person .
>>>>> >> ?person dbo:birthDate ?date .
>>>>> >> }
>>>>> >>
>>>>> >> It should be able to return the correct answer. However, there is no
>>>>> >> result. The problem is caused by the redirection.
>>>>> >>
>>>>> >> I am curious that even the Wikipedia article doesn't use the
>>>>> >> redirection. Why does the corresponding DBpedia article use it?
>>>>> >>
>>>>> >>
>>>>> >> Best regards,
>>>>> >> Lushan Han
>>>>> >>
>>>>> >> On Wed, Apr 13, 2011 at 5:23 AM, Dimitris Kontokostas
>>>>> >> <[email protected]>
>>>>> >> wrote:
>>>>> >>> Hi,
>>>>> >>>
>>>>> >>> The wikipedia article about John_Augustus_Roebling (1) redirects to
>>>>> >>> John_A._Roebling (2)
>>>>> >>> that is why you cannot find any information for (1)
>>>>> >>>
>>>>> >>> the Brooklyn Bride article has a link on the redirection article
>>>>> >>>
>>>>> >>> Although this is not an a bug, it could be resolved in the
>>>>> >>> extraction
>>>>> >>> framework and replace all redirections to the proper articles.
>>>>> >>> A shell script could do the job, any ideas / comments?
>>>>> >>>
>>>>> >>> Cheers,
>>>>> >>> Dimitris
>>>>> >>>
>>>>> >>> On Tue, Apr 12, 2011 at 11:22 PM, Lushan Han <[email protected]>
>>>>> >>> wrote:
>>>>> >>>>
>>>>> >>>> Hi,
>>>>> >>>>
>>>>> >>>> It surprised me that a dbpedia URI is not consistent with its
>>>>> >>>> corresponding Wikipedia URI. This is
>>>>> >>>> http://en.wikipedia.org/wiki/John_Augustus_Roebling. Its
>>>>> >>>> corresponding
>>>>> >>>> URI in dbpedia is http://dbpedia.org/page/John_A._Roebling. I
>>>>> >>>> think we
>>>>> >>>> need resolve this issue because i found it break link of data. For
>>>>> >>>> example, from http://dbpedia.org/page/Brooklyn_Bridge, you can
>>>>> >>>> know
>>>>> >>>> its dbpedia-owl:architect is dbpedia:John_Augustus_Roebling.
>>>>> >>>> However,
>>>>> >>>> when I query the rdf:type of dbpedia:John_Augustus_Roebling using
>>>>> >>>> SPARQL endpoint, it gave me no result. The reason is that there is
>>>>> >>>> no
>>>>> >>>> dbpedia:John_Augustus_Roebling but instead
>>>>> >>>> dbpedia:John_A._Roebling.
>>>>> >>>>
>>>>> >>>> I don't know how many else such URIs exist.
>>>>> >>>>
>>>>> >>>> Best regards,
>>>>> >>>> Lushan Han
>>>>> >>>>
>>>
>>>
>>> --
>>> Kontokostas Dimitris
>>
>
>
>
> --
> Kontokostas Dimitris
>
> ------------------------------------------------------------------------------
> Benefiting from Server Virtualization: Beyond Initial Workload
> Consolidation -- Increasing the use of server virtualization is a top
> priority.Virtualization can reduce costs, simplify management, and improve
> application availability and disaster protection. Learn more about boosting
> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to