Hi,

param come from a bad copy paste, it's "pop" the good variable.

By the way thank you for the hint about AST I will take a look at these
class and see how I can use them. I won't hesitate to ask if I'm blocked :-)

Best.

Julien.


2013/4/22 Jona Christopher Sahnwaldt <[email protected]>

> Hi Julien,
>
> On 22 April 2013 21:43, Julien Plu <[email protected]>
> wrote:
> > I started the code for the extractor and I have a problem with the regex
> in
> > Scala. the string is :
> >
> http://fr.wikipedia.org/w/index.php?title=Mod%C3%A8le:Donn%C3%A9es/Antony/%C3%A9volution_population&action=edit
> >
> > And my regex is : val populationRegex = """|pop=(\d+)""".r
> >
> > And I use this piece of code :
> >
> > populationRegex findAllIn  page.children.toString foreach (_ match {
> >     case populationRegex (pop) => println(page.title.decoded + " : pop :
> " +
> > param)
>
> What is param?
>
> But more generally - did you try using the AST (abstract syntax tree)
> built by the parser, i.e. the tree whose root node is the PageNode?
> I'm not sure how good our parser is at dealing with stuff like
> "<includeonly>" and "{{#switch ...}}", but I think it works and
> page.children should contain a ParserFunctionNode [1] object for the
> #switch, which in turn has a child for each branch, e.g. one child for
> an=2010 and one for pop=61793. These children are PropertyNode [2]
> objects, which have a key and (who would have thought) more children.
> Well, in this case, just one child, which is a TextNode. In a
> nutshell: Find the "#switch" node, find children with keys "an" and
> "pop", and generate triples for their values.
>
> >     case _ =>
> > })
> >
> > And instead of to get : "Données/Antony/évolution population : pop :
> 61793"
> > just once
> >
> > I have many : "Données/Antony/évolution population : pop : null" as much
> as
> > there is line in the string
> >
> > An idea of what I do wrongly ?
> >
> > I'm totally beginner in Scala :-( sorry.
>
> Your code excerpt looks pretty good to me. :-)
>
> The AST is usually much safer and cleaner than regexes. Regexes are
> more suitable for unstructured strings, but here you're dealing with
> pretty clean structures. So I would suggest you write some code that
> walks through the PageNode tree. If you have any questions, don't
> hesitate to ask. We're looking forward to your contributions. Thanks!
>
> Cheers,
> JC
>
> [1]
> https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/wikiparser/ParserFunctionNode.scala
> [2]
> https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/wikiparser/PropertyNode.scala
>
> >
> > Best.
> >
> > Julien.
> >
> >
> > 2013/4/22 Jona Christopher Sahnwaldt <[email protected]>
> >>
> >> The templates where data is stored are not used directly in the main
> >> pages. It's a complicated process: page Toulouse uses template X, X
> uses Y,
> >> Y uses Z, and Z contains the data. Something like that, I'm 100% sure,
> but
> >> the details don't matter. This means that wikiPageUsesTemplate and
> >> InfoboxExtractor won't help.
> >>
> >> Generating a separate file is probably the best idea. We could also send
> >> these new triples to the main mapping based file, but that might be
> >> confusing: first, they're not mapping based; second, new triples about a
> >> city would be added in a completely different place in the file.
> (That's not
> >> a big problem though.)
> >>
> >> Cheers,
> >> JC
> >
> >
>
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to