[ 
https://issues.apache.org/jira/browse/ANY23-154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16666659#comment-16666659
 ] 

Hudson commented on ANY23-154:
------------------------------

SUCCESS: Integrated in Jenkins build Any23-trunk #1643 (See 
[https://builds.apache.org/job/Any23-trunk/1643/])
ANY23-154 allow unused itemprops (hans: rev 
36682ccdfbddcd924cb5840e25d956f581e7125f)
* (edit) 
core/src/test/java/org/apache/any23/extractor/microdata/MicrodataExtractorTest.java
* (edit) 
core/src/main/java/org/apache/any23/extractor/microdata/ItemPropValue.java
* (edit) 
core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java
* (add) test-resources/src/test/resources/microdata/unused-itemprop.html


> Not able to extract microdata in few test cases
> -----------------------------------------------
>
>                 Key: ANY23-154
>                 URL: https://issues.apache.org/jira/browse/ANY23-154
>             Project: Apache Any23
>          Issue Type: Bug
>          Components: microdata
>    Affects Versions: 0.7.0
>         Environment: Windows 7 32bit
> JDK 1.6.0_38
> Intel Core 2 duo and 4GB RAM
>            Reporter: Kunal P
>            Assignee: Hans Brende
>            Priority: Major
>             Fix For: 2.3
>
>         Attachments: XOYRVIbK.part, neeraj.nowfloats.com.htm
>
>
> we are using ApacheAny23 API for extracting microdata from the given web-page 
> as part of internal project.
> we have some test cases where api is not able to parse the microdata. 
> www.neeraj.nowfloats.com (The web page is not following schema.org standards 
> strictly)
> I am giving the snippit of the HTML code here.
> <div id="someid" itemprop="offer" itemscope 
> itemtype="http://schema.org/Offer";>
>   <div ... ></div>
> </div>
> It clearly shows that given microdata is a child of some parent microdata 
> specification as it contains itemscope as well as itemprop in the same tag. 
> And the given <div id="someid"> tag has no parent microdata specification.
> The method used for extracting ItemScopes is as follows,
> import org.apache.any23.extractor.microdata.ItemScope;
> import org.apache.any23.extractor.microdata.MicrodataParser;
> import org.apache.any23.extractor.microdata.MicrodataParserReport;
> Document dom = getDomDocument(String html)
> MicrodataParserReport report = MicrodataParser.getMicrodata(dom);
> ItemScope[] items = report.getDetectedItemScopes();
> here, items doesnt contain any ItemScope which has above test case. 
> In such scenario, how can we extract microdata from the page using any23 api.
> Is there any way to relax the criterion of itemprop and itemscope not 
> appearing in the same tag so that we get the data from the webpage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to