[ https://issues.apache.org/jira/browse/ANY23-154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16666659#comment-16666659 ]
Hudson commented on ANY23-154: ------------------------------ SUCCESS: Integrated in Jenkins build Any23-trunk #1643 (See [https://builds.apache.org/job/Any23-trunk/1643/]) ANY23-154 allow unused itemprops (hans: rev 36682ccdfbddcd924cb5840e25d956f581e7125f) * (edit) core/src/test/java/org/apache/any23/extractor/microdata/MicrodataExtractorTest.java * (edit) core/src/main/java/org/apache/any23/extractor/microdata/ItemPropValue.java * (edit) core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java * (add) test-resources/src/test/resources/microdata/unused-itemprop.html > Not able to extract microdata in few test cases > ----------------------------------------------- > > Key: ANY23-154 > URL: https://issues.apache.org/jira/browse/ANY23-154 > Project: Apache Any23 > Issue Type: Bug > Components: microdata > Affects Versions: 0.7.0 > Environment: Windows 7 32bit > JDK 1.6.0_38 > Intel Core 2 duo and 4GB RAM > Reporter: Kunal P > Assignee: Hans Brende > Priority: Major > Fix For: 2.3 > > Attachments: XOYRVIbK.part, neeraj.nowfloats.com.htm > > > we are using ApacheAny23 API for extracting microdata from the given web-page > as part of internal project. > we have some test cases where api is not able to parse the microdata. > www.neeraj.nowfloats.com (The web page is not following schema.org standards > strictly) > I am giving the snippit of the HTML code here. > <div id="someid" itemprop="offer" itemscope > itemtype="http://schema.org/Offer"> > <div ... ></div> > </div> > It clearly shows that given microdata is a child of some parent microdata > specification as it contains itemscope as well as itemprop in the same tag. > And the given <div id="someid"> tag has no parent microdata specification. > The method used for extracting ItemScopes is as follows, > import org.apache.any23.extractor.microdata.ItemScope; > import org.apache.any23.extractor.microdata.MicrodataParser; > import org.apache.any23.extractor.microdata.MicrodataParserReport; > Document dom = getDomDocument(String html) > MicrodataParserReport report = MicrodataParser.getMicrodata(dom); > ItemScope[] items = report.getDetectedItemScopes(); > here, items doesnt contain any ItemScope which has above test case. > In such scenario, how can we extract microdata from the page using any23 api. > Is there any way to relax the criterion of itemprop and itemscope not > appearing in the same tag so that we get the data from the webpage. -- This message was sent by Atlassian JIRA (v7.6.3#76005)