[
https://issues.apache.org/jira/browse/ANY23-389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hans Brende resolved ANY23-389.
-------------------------------
Resolution: Fixed
Assignee: Hans Brende
> RDFa extraction breaks when base element uses relative href
> -----------------------------------------------------------
>
> Key: ANY23-389
> URL: https://issues.apache.org/jira/browse/ANY23-389
> Project: Apache Any23
> Issue Type: Bug
> Components: extractors
> Affects Versions: 2.3
> Reporter: Hans Brende
> Assignee: Hans Brende
> Priority: Major
> Fix For: 2.3
>
>
> I noticed that when extracting from html such as this:
> {code:html}
> <html prefix="og: http://ogp.me/ns#">
> <head>
> <base href="">
> <link rel="icon" type="image/x-icon"
> href="https://static1.squarespace.com/static/55085720e4b0813599644fae/t/56291c91e4b0377cf53e5981/favicon.ico"/>
> <meta property="og:site_name" content="36°N"/>
> <meta property="og:title" content="36°N Friends & Family Night"/>
> <meta property="og:latitude" content="36.1604966"/>
> <meta property="og:longitude" content="-95.9889172"/>
> <meta property="og:street-address" content="201 North Elgin Avenue"/>
> <meta property="og:locality" content="Tulsa"/>
> <meta property="og:region" content="OK"/>
> <meta property="og:postal-code" content="74120"/>
> <meta property="og:country-name" content="United States"/>
> <meta property="og:url"
> content="https://www.36degreesnorth.co/events/2018/8/2/36n-friends-family-night"/>
> <meta property="og:type" content="website"/>
> <meta property="og:description" content="Hey 36°N Members! Grab your
> family or a close friend, and join us for a fun night at the ballpark. We
> reserved the Coors Light Refinery Deck at ONEOK Field, so we can all hang
> out, enjoy a buffet and watch the game in the shade. Dinner starts at 6:30.
> Game starts at 7:00. $5/person. $20/family (co"/>
> <meta property="og:image"
> content="http://static1.squarespace.com/static/55085720e4b0813599644fae/5768549715d5db9b150af935/5a62695653450a1e55940197/1528903903136/DRILLERS+FAMILY+NIGHT-+square.png?format=1000w"/>
> <meta property="og:image:width" content="800"/>
> <meta property="og:image:height" content="800"/>
> </head><body></body>
> </html>
> {code}
> none of the rdfa11 triples (neither the og properties nor the icon property)
> are extracted as expected, apparently due to the underlying rdfa11 parser
> requiring an *absolute base href* rather than a relative one.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)