Hi folks, we also need to extract from List_of_X pages, so we’d like to
collaborate. Some notes:
- we plan to “convert” List_of_X to a category X: i.e. each item in the list
will get a category assignment X (e.g. List_of_Christmas_foods ->
category:Christmas_foods)
- Rather than creating category X blindly, we should search for it using some
simple heuristics (e.g. the list has a similarly-named category, or a
similarly-named category exists).
- IMHO it’s only safe to treat the first link in each list item this way (e.g.
examples of “* [[Pudding]], traditionally made in the [[Australian Bush]]”)
What’s your progress on this task?
Will you be addressing multiple languages?
Please email the 3 of us at Ontotext, thanks!
From: Nico Ring [mailto:nico.r...@student.hpi.de]
Sent: Wednesday, May 20, 2015 3:50 PM
To: dbpedia-developers@lists.sourceforge.net
Cc: Mischkewitz, Sven; Fabian Windheuser; Patrick Kuhn
Subject: [Dbpedia-developers] Questions about DBpedia Extraction Framework
Hi all,
we are students of the Hasso Plattner Institute and take part in a seminar of
the Semantic Web chair. Our task is to extract information from Wikipedia „List
of“ pages to DBpedia. Therefore we thought about using the
DBpediaExtractionFramework.
We have some questions regarding the framework:
* We use IntelliJ for development. But if we place a breakpoint in the
extraction method of our extractor and debug our maven goal the debugger
doesn’t stop. We already tried out to use mvnDebug and attach to it using the
RemoteDebugger from IntelliJ. Is there anything we need to do, to debug the
framework?
* There is a dataset needed to be set for each Extractor, what is purpose
of it?
* Is there a way to add state to the extraction process or some static
information? It seems for us like the context object does something like that,
but we don’t really understand where the content comes from and how to add new
objects to it.
* We also want to extract List_of pages which are in a table format. We
found the classes `Table Node, TableRowNode, TableCellNode` which we would like
to use. But if we extend `PageNodeExtractor` the tables don’t get wrapped in
these classes, but are just TextNodes and InternalLinkNodes. There is a class
called TableMapping, which looks handy, but we don’t know if and how we could
use it.
* Is there a way to do after processing of the results?
Thanks in advance for answering all the questions.
Kind Regards,
Patrick Kuhn, Fabian Windheuser, Sven Mischkewitz and Nico Ring
------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Dbpedia-developers mailing list
Dbpedia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers