Hi, On Tue, Jul 19, 2011 at 03:19, Tommy Chheng <[email protected]> wrote: > I'm trying to use the WikiParser to determine the category list of a > wikipedia page.
You can use org.dbpedia.extraction.mappings.ArticleCategoriesExtractor [1] for this task. It extracts triples with dc:subject as predicate. > The category tags are represented as TextNode objects but when I print out > the toWikiText, it get an empty string. Should categories be "TextNodes" and > if so, what's the correct extract the category name from the wikipage? The category tags are actually InternalLinkNodes. That might have been the problem in your provided code. Cheers, Max [1] http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/3ea1a79638a1/core/src/main/scala/org/dbpedia/extraction/mappings/ArticleCategoriesExtractor.scala ------------------------------------------------------------------------------ Got Input? Slashdot Needs You. Take our quick survey online. Come on, we don't ask for help often. Plus, you'll get a chance to win $100 to spend on ThinkGeek. http://p.sf.net/sfu/slashdot-survey _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
