Just committed a script that resolves redirects for URIs in object position in .nt and .nq files.
Cheers, Max On Fri, Apr 15, 2011 at 17:23, Dimitris Kontokostas <[email protected]> wrote: > the DBpediaResourceFactory seems better but redirects can be very big (only > the English one is ~750MB). I don't know how the framework can handle such > big data > > You're right about the transitive issue, I haven't thought of it :) > you just found a bug in a new script i am creating :) > anyway, this can be worked out (somehow I guess) > > Cheers, > Dimitris > > On Fri, Apr 15, 2011 at 5:57 PM, Pablo Mendes <[email protected]> wrote: >> >> I was thinking it could slowly evolve to a sort of DBpediaResourceFactory >> class at the core of the workflow who knew everything about transforming >> Wikipedia Page URLs into DBpedia Resource URIs/IRIs (including >> language-specific knowledge, redirects, etc.) >> But, yes, sure. Your solution sounds simple and efficient. :) >> Keep in mind that redirects.nt may need some treatment to compute the >> transitive closure (A redirects_to B redirects_to C -> A redirects_to C). >> Cheers, >> Pablo >> On Fri, Apr 15, 2011 at 3:45 PM, Dimitris Kontokostas <[email protected]> >> wrote: >>> >>> A new extractor will be too expensive >>> i think a script can do the job just fine >>> >>> it will have the redirects.nt as a look-up table and replace all >>> occurrences in the extraction dumps >>> >>> cheers, >>> Dimitris >>> >>> On Fri, Apr 15, 2011 at 4:10 PM, Pablo Mendes <[email protected]> >>> wrote: >>>>> >>>>> I like the second approach ... if we could use a unique URI to denote >>>>> the same entity, we are better off. >>>> >>>> Yep. The disadvantage is that it is intrusive (requires access to >>>> DBpedia extraction). Luckily, DBpedia is an open source project to which >>>> any >>>> of us can contribute. Better yet, you can adapt similar code from DBpedia >>>> Spotlight into a DBpedia extractor and contribute it to the project. It >>>> should be in: org.dbpedia.spotlight.util.SurrogatesUtil.scala >>>> >>>> (http://dbp-spotlight.svn.sourceforge.net/viewvc/dbp-spotlight/trunk/core/src/main/scala/) >>>> I will make sure to bug the leader of the next release to include it. :) >>>> Cheers, >>>> Pablo >>>> On Fri, Apr 15, 2011 at 2:43 PM, Lushan Han <[email protected]> wrote: >>>>> >>>>> I like the second approach -- resolving the problem at extraction >>>>> time. Inference with large amount of data is still difficult. If we >>>>> could use a unique URI to denote the same entity, we are better off. >>>>> >>>>> Thank you all for immediate response, >>>>> Lushan Han >>>>> >>>>> >>>>> On Thu, Apr 14, 2011 at 4:37 AM, Pablo Mendes <[email protected]> >>>>> wrote: >>>>> > Maybe what Dimitris says is that this query would indeed be answered >>>>> > if: >>>>> > - redirects were treated as sameAs and inference was used (works for >>>>> > this >>>>> > but not all cases) >>>>> > - the framework used redirects to do identity resolution at >>>>> > extraction time >>>>> > >>>>> > Also, i should point out that you can probably sort this problem out >>>>> > with a >>>>> > simple Silk link spec. >>>>> > >>>>> > Cheers >>>>> > Pablo >>>>> > >>>>> > On Apr 13, 2011 3:12 PM, "Lushan Han" <[email protected]> wrote: >>>>> >> Hi Dimitris, >>>>> >> >>>>> >> I am afraid that you did not completely see my point. It is not >>>>> >> simply >>>>> >> a redirection problem. >>>>> >> For example, if I want to make a SPARQL query -- what is the birth >>>>> >> date of the architect who designed the Brooklyn Bridge? >>>>> >> >>>>> >> PREFIX dbo: <http://dbpedia.org/ontology/> >>>>> >> >>>>> >> SELECT ?person, ?date WHERE { >>>>> >> :Brooklyn_Bridge dbo:architect ?person . >>>>> >> ?person dbo:birthDate ?date . >>>>> >> } >>>>> >> >>>>> >> It should be able to return the correct answer. However, there is no >>>>> >> result. The problem is caused by the redirection. >>>>> >> >>>>> >> I am curious that even the Wikipedia article doesn't use the >>>>> >> redirection. Why does the corresponding DBpedia article use it? >>>>> >> >>>>> >> >>>>> >> Best regards, >>>>> >> Lushan Han >>>>> >> >>>>> >> On Wed, Apr 13, 2011 at 5:23 AM, Dimitris Kontokostas >>>>> >> <[email protected]> >>>>> >> wrote: >>>>> >>> Hi, >>>>> >>> >>>>> >>> The wikipedia article about John_Augustus_Roebling (1) redirects to >>>>> >>> John_A._Roebling (2) >>>>> >>> that is why you cannot find any information for (1) >>>>> >>> >>>>> >>> the Brooklyn Bride article has a link on the redirection article >>>>> >>> >>>>> >>> Although this is not an a bug, it could be resolved in the >>>>> >>> extraction >>>>> >>> framework and replace all redirections to the proper articles. >>>>> >>> A shell script could do the job, any ideas / comments? >>>>> >>> >>>>> >>> Cheers, >>>>> >>> Dimitris >>>>> >>> >>>>> >>> On Tue, Apr 12, 2011 at 11:22 PM, Lushan Han <[email protected]> >>>>> >>> wrote: >>>>> >>>> >>>>> >>>> Hi, >>>>> >>>> >>>>> >>>> It surprised me that a dbpedia URI is not consistent with its >>>>> >>>> corresponding Wikipedia URI. This is >>>>> >>>> http://en.wikipedia.org/wiki/John_Augustus_Roebling. Its >>>>> >>>> corresponding >>>>> >>>> URI in dbpedia is http://dbpedia.org/page/John_A._Roebling. I >>>>> >>>> think we >>>>> >>>> need resolve this issue because i found it break link of data. For >>>>> >>>> example, from http://dbpedia.org/page/Brooklyn_Bridge, you can >>>>> >>>> know >>>>> >>>> its dbpedia-owl:architect is dbpedia:John_Augustus_Roebling. >>>>> >>>> However, >>>>> >>>> when I query the rdf:type of dbpedia:John_Augustus_Roebling using >>>>> >>>> SPARQL endpoint, it gave me no result. The reason is that there is >>>>> >>>> no >>>>> >>>> dbpedia:John_Augustus_Roebling but instead >>>>> >>>> dbpedia:John_A._Roebling. >>>>> >>>> >>>>> >>>> I don't know how many else such URIs exist. >>>>> >>>> >>>>> >>>> Best regards, >>>>> >>>> Lushan Han >>>>> >>>> >>> >>> >>> -- >>> Kontokostas Dimitris >> > > > > -- > Kontokostas Dimitris > > ------------------------------------------------------------------------------ > Benefiting from Server Virtualization: Beyond Initial Workload > Consolidation -- Increasing the use of server virtualization is a top > priority.Virtualization can reduce costs, simplify management, and improve > application availability and disaster protection. Learn more about boosting > the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev > _______________________________________________ > Dbpedia-discussion mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion > > ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
