[
https://issues.apache.org/jira/browse/ANY23-154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hans Brende resolved ANY23-154.
-------------------------------
Resolution: Fixed
Assignee: Hans Brende
> Not able to extract microdata in few test cases
> -----------------------------------------------
>
> Key: ANY23-154
> URL: https://issues.apache.org/jira/browse/ANY23-154
> Project: Apache Any23
> Issue Type: Bug
> Components: microdata
> Affects Versions: 0.7.0
> Environment: Windows 7 32bit
> JDK 1.6.0_38
> Intel Core 2 duo and 4GB RAM
> Reporter: Kunal P
> Assignee: Hans Brende
> Priority: Major
> Fix For: 2.3
>
> Attachments: XOYRVIbK.part, neeraj.nowfloats.com.htm
>
>
> we are using ApacheAny23 API for extracting microdata from the given web-page
> as part of internal project.
> we have some test cases where api is not able to parse the microdata.
> www.neeraj.nowfloats.com (The web page is not following schema.org standards
> strictly)
> I am giving the snippit of the HTML code here.
> <div id="someid" itemprop="offer" itemscope
> itemtype="http://schema.org/Offer">
> <div ... ></div>
> </div>
> It clearly shows that given microdata is a child of some parent microdata
> specification as it contains itemscope as well as itemprop in the same tag.
> And the given <div id="someid"> tag has no parent microdata specification.
> The method used for extracting ItemScopes is as follows,
> import org.apache.any23.extractor.microdata.ItemScope;
> import org.apache.any23.extractor.microdata.MicrodataParser;
> import org.apache.any23.extractor.microdata.MicrodataParserReport;
> Document dom = getDomDocument(String html)
> MicrodataParserReport report = MicrodataParser.getMicrodata(dom);
> ItemScope[] items = report.getDetectedItemScopes();
> here, items doesnt contain any ItemScope which has above test case.
> In such scenario, how can we extract microdata from the page using any23 api.
> Is there any way to relax the criterion of itemprop and itemscope not
> appearing in the same tag so that we get the data from the webpage.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)