Hi Omri, DBpedia parses link texts, but later doesn't use them (I think). You will have to change the code a little.
You could make a copy of ExternalLinksExtractor [1] and adapt it to your needs: go through all the external links and instead of link.destination call link.toPlainText. See ExternalLinkNode [2]. Or maybe even better, add a configuration setting to ExternalLinksExtractor that determines if link texts should also be extracted. What should the RDF triples that you generate look like? Before you start, you should read this page: https://github.com/dbpedia/extraction-framework/wiki/Contributing This way, you can send us a pull request when you are done, and if we like your changes, we can incorporate them and everyone can reap the benefits. :-) Cheers, JC [1] https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/mappings/ExternalLinksExtractor.scala [2] https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/wikiparser/LinkNode.scala On 3 April 2013 14:17, Omri Oren <[email protected]> wrote: > Hello discussion group :) > > I'd like to have a list of external links of wiki pages (the ones that lead > to external sites, at the end of each article), but also to have the text of > the hyperlink as it appears in the article. > > For example, for the 2 External Links at the bottom of > http://en.wikipedia.org/wiki/Brazilian_Olympic_Committee I'd like to have > the strings "Official website" and "Website for the 2004 Summer Olympic > Games" > > I know there's an ExternalLinks extractor, but how do I extract the > hyperlink text too? > > Cheers, > Omri > > > ------------------------------------------------------------------------------ > Minimize network downtime and maximize team effectiveness. > Reduce network management and security costs.Learn how to hire > the most talented Cisco Certified professionals. Visit the > Employer Resources Portal > http://www.cisco.com/web/learning/employer_resources/index.html > _______________________________________________ > Dbpedia-discussion mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion > ------------------------------------------------------------------------------ Minimize network downtime and maximize team effectiveness. Reduce network management and security costs.Learn how to hire the most talented Cisco Certified professionals. Visit the Employer Resources Portal http://www.cisco.com/web/learning/employer_resources/index.html _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
