Repository: any23
Updated Branches:
  refs/heads/master 54c461269 -> 7f08621b2


Support attribute content on all fields.

<sometag content="something" /> should be considered, regardless if `content` 
is not a valid
attribute of `sometag`.

The specification for microdata[1] details that an elements content attribute 
should be considered
before text content.

Any23 doesn't currently do this, it only considers `content` for `meta` tags 
which is the only
HTML tag which is suppose to have a `content` but not all sites follow HTML 
specifications.

Updating the microdata parser to be able to get `content` from any element 
should it exist.

[1] https://www.w3.org/TR/microdata/#values

Signed-off-by: Ian Duffy <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/any23/repo
Commit: http://git-wip-us.apache.org/repos/asf/any23/commit/28a68b53
Tree: http://git-wip-us.apache.org/repos/asf/any23/tree/28a68b53
Diff: http://git-wip-us.apache.org/repos/asf/any23/diff/28a68b53

Branch: refs/heads/master
Commit: 28a68b535285f9d084725728f758272a3eda21be
Parents: dfcccad
Author: Ian Duffy <[email protected]>
Authored: Wed Nov 8 13:59:42 2017 +0000
Committer: Ian Duffy <[email protected]>
Committed: Wed Nov 8 14:04:44 2017 +0000

----------------------------------------------------------------------
 .../java/org/apache/any23/extractor/microdata/MicrodataParser.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/any23/blob/28a68b53/core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java
----------------------------------------------------------------------
diff --git 
a/core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java 
b/core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java
index 8ee1cc6..147fd18 100644
--- 
a/core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java
+++ 
b/core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java
@@ -309,7 +309,7 @@ public class MicrodataParser {
         if(itemPropValue != null) return itemPropValue;
 
         final String nodeName = node.getNodeName().toLowerCase();
-        if ("meta".equals(nodeName)) {
+        if (DomUtils.hasAttribute(node, "content")) {
             return new ItemPropValue(DomUtils.readAttribute(node, "content"), 
ItemPropValue.Type.Plain);
         }
 

Reply via email to