Sorry but I really don't understand how AST works (and Scala too) I try to
retrieve all the PropertyNode contained in a PageNode so I do :
override def extract(page: PageNode, subjectUri: String, pageContext:
PageContext): Seq[Quad] = {
if (page.title.namespace != Namespace.Template || page.isRedirect ||
!page.title.decoded.contains("évolution population")) return Seq.empty
for (node <- page.children) {
for (property <- allPropertiesNode(node)) {
println(property.toWikiText)
}
}
}
private def allPropertiesNode(node : Node) : List[PropertyNode] = {
node match {
case propertyNode : PropertyNode => List(propertyNode)
case _ = node.children
}
}
And nothing is displayed on my screen :-(
Any idea of what I do wrongly ?
BesT.
Julien.
2013/4/23 Julien Plu <[email protected]>
> Hi,
>
> param come from a bad copy paste, it's "pop" the good variable.
>
> By the way thank you for the hint about AST I will take a look at these
> class and see how I can use them. I won't hesitate to ask if I'm blocked :-)
>
> Best.
>
> Julien.
>
>
> 2013/4/22 Jona Christopher Sahnwaldt <[email protected]>
>
>> Hi Julien,
>>
>> On 22 April 2013 21:43, Julien Plu <[email protected]>
>> wrote:
>> > I started the code for the extractor and I have a problem with the
>> regex in
>> > Scala. the string is :
>> >
>> http://fr.wikipedia.org/w/index.php?title=Mod%C3%A8le:Donn%C3%A9es/Antony/%C3%A9volution_population&action=edit
>> >
>> > And my regex is : val populationRegex = """|pop=(\d+)""".r
>> >
>> > And I use this piece of code :
>> >
>> > populationRegex findAllIn page.children.toString foreach (_ match {
>> > case populationRegex (pop) => println(page.title.decoded + " : pop
>> : " +
>> > param)
>>
>> What is param?
>>
>> But more generally - did you try using the AST (abstract syntax tree)
>> built by the parser, i.e. the tree whose root node is the PageNode?
>> I'm not sure how good our parser is at dealing with stuff like
>> "<includeonly>" and "{{#switch ...}}", but I think it works and
>> page.children should contain a ParserFunctionNode [1] object for the
>> #switch, which in turn has a child for each branch, e.g. one child for
>> an=2010 and one for pop=61793. These children are PropertyNode [2]
>> objects, which have a key and (who would have thought) more children.
>> Well, in this case, just one child, which is a TextNode. In a
>> nutshell: Find the "#switch" node, find children with keys "an" and
>> "pop", and generate triples for their values.
>>
>> > case _ =>
>> > })
>> >
>> > And instead of to get : "Données/Antony/évolution population : pop :
>> 61793"
>> > just once
>> >
>> > I have many : "Données/Antony/évolution population : pop : null" as
>> much as
>> > there is line in the string
>> >
>> > An idea of what I do wrongly ?
>> >
>> > I'm totally beginner in Scala :-( sorry.
>>
>> Your code excerpt looks pretty good to me. :-)
>>
>> The AST is usually much safer and cleaner than regexes. Regexes are
>> more suitable for unstructured strings, but here you're dealing with
>> pretty clean structures. So I would suggest you write some code that
>> walks through the PageNode tree. If you have any questions, don't
>> hesitate to ask. We're looking forward to your contributions. Thanks!
>>
>> Cheers,
>> JC
>>
>> [1]
>> https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/wikiparser/ParserFunctionNode.scala
>> [2]
>> https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/wikiparser/PropertyNode.scala
>>
>> >
>> > Best.
>> >
>> > Julien.
>> >
>> >
>> > 2013/4/22 Jona Christopher Sahnwaldt <[email protected]>
>> >>
>> >> The templates where data is stored are not used directly in the main
>> >> pages. It's a complicated process: page Toulouse uses template X, X
>> uses Y,
>> >> Y uses Z, and Z contains the data. Something like that, I'm 100% sure,
>> but
>> >> the details don't matter. This means that wikiPageUsesTemplate and
>> >> InfoboxExtractor won't help.
>> >>
>> >> Generating a separate file is probably the best idea. We could also
>> send
>> >> these new triples to the main mapping based file, but that might be
>> >> confusing: first, they're not mapping based; second, new triples about
>> a
>> >> city would be added in a completely different place in the file.
>> (That's not
>> >> a big problem though.)
>> >>
>> >> Cheers,
>> >> JC
>> >
>> >
>>
>
>
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion