[jira] [Commented] (SLING-6783) Updates for Commons HTML

2018-05-08 Thread Oliver Lietz (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467623#comment-16467623
 ] 

Oliver Lietz commented on SLING-6783:
-

[~jebailey], tests are fine on my local machine and on 
[Jenkins|https://builds.apache.org/view/S-Z/view/Sling/job/sling-org-apache-sling-commons-html-1.8/30/console]:
{noformat}
[INFO] --- maven-failsafe-plugin:2.20.1:integration-test (default) @ 
org.apache.sling.commons.html ---
[INFO] 
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.sling.commons.html.it.TagsoupHtmlParserIT
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.485 s 
- in org.apache.sling.commons.html.it.TagsoupHtmlParserIT
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[JENKINS] Recording test results
{noformat}

Can you set the {{timeout}} parameter on the {{Filter}} annotation and see if 
it fixes your issue?
{noformat}
@Inject
@Filter(value = "(&(dom=tagsoup)(sax=tagsoup))")
private HtmlParser htmlParser;
{noformat}


> Updates for Commons HTML
> 
>
> Key: SLING-6783
> URL: https://issues.apache.org/jira/browse/SLING-6783
> Project: Sling
>  Issue Type: Improvement
>  Components: Commons
>Reporter: Jason E Bailey
>Assignee: Oliver Lietz
>Priority: Minor
> Fix For: Commons HTML 1.0.2
>
> Attachments: sling.patch
>
>
> Following updates:
> Updated tagsoup lib to 1.2.1 which has the following modifications
> * DOCTYPE is now recognized even in lower case.
> * We make sure to buffer the reader, eliminating a long-standing bug that 
> would crash on certain inputs, such as & followed by CR+LF.
> * The HTML scanner's table is precompiled at run time for efficiency, causing 
> a 4x speedup on large input documents.
> * ]] within a CDATA section no longer causes input to be discarded.
> * Remove bogus newline after printing children of the root element.
> * Allow the noscript element anywhere, the same as the script element.
> * Updated to the 2011 edition of the W3C character entity list.
> Additionally:
> Updated license with new home page for tagsoup
> Updated annotations to OSGi annotations
> Added the ability to specify additional features/properties for the parser
> Documented available settings
> Javadoc fixed
> Prepared for different parsers by renaming HtmlParserImpl and adding 
> component properties
> Configuration improved



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-6783) Updates for Commons HTML

2018-05-08 Thread Jason E Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467505#comment-16467505
 ] 

Jason E Bailey commented on SLING-6783:
---

[~olli] I'm running into problems with the paxexam test. 

[INFO] Results:
[INFO] 
[ERROR] Errors: 
[ERROR]   TagsoupHtmlParserIT.testFeaturesConfiguration » IllegalState services 
vanished...
[ERROR]   TagsoupHtmlParserIT.testHtmlParser » IllegalState services vanished 
too fast.
[INFO] 
[ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0

Any idea? I'm not able to deploy.

> Updates for Commons HTML
> 
>
> Key: SLING-6783
> URL: https://issues.apache.org/jira/browse/SLING-6783
> Project: Sling
>  Issue Type: Improvement
>  Components: Commons
>Reporter: Jason E Bailey
>Assignee: Oliver Lietz
>Priority: Minor
> Fix For: Commons HTML 1.0.2
>
> Attachments: sling.patch
>
>
> Following updates:
> Updated tagsoup lib to 1.2.1 which has the following modifications
> * DOCTYPE is now recognized even in lower case.
> * We make sure to buffer the reader, eliminating a long-standing bug that 
> would crash on certain inputs, such as & followed by CR+LF.
> * The HTML scanner's table is precompiled at run time for efficiency, causing 
> a 4x speedup on large input documents.
> * ]] within a CDATA section no longer causes input to be discarded.
> * Remove bogus newline after printing children of the root element.
> * Allow the noscript element anywhere, the same as the script element.
> * Updated to the 2011 edition of the W3C character entity list.
> Additionally:
> Updated license with new home page for tagsoup
> Updated annotations to OSGi annotations
> Added the ability to specify additional features/properties for the parser
> Documented available settings
> Javadoc fixed
> Prepared for different parsers by renaming HtmlParserImpl and adding 
> component properties
> Configuration improved



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-6783) Updates for Commons HTML

2018-05-07 Thread Oliver Lietz (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466408#comment-16466408
 ] 

Oliver Lietz commented on SLING-6783:
-

[~jebailey], no – go ahead! And let's discuss modernization of Commons HTML and 
Rewriter at dev@.

> Updates for Commons HTML
> 
>
> Key: SLING-6783
> URL: https://issues.apache.org/jira/browse/SLING-6783
> Project: Sling
>  Issue Type: Improvement
>  Components: Commons
>Reporter: Jason E Bailey
>Assignee: Oliver Lietz
>Priority: Minor
> Fix For: Commons HTML 1.0.2
>
> Attachments: sling.patch
>
>
> Following updates:
> Updated tagsoup lib to 1.2.1 which has the following modifications
> * DOCTYPE is now recognized even in lower case.
> * We make sure to buffer the reader, eliminating a long-standing bug that 
> would crash on certain inputs, such as & followed by CR+LF.
> * The HTML scanner's table is precompiled at run time for efficiency, causing 
> a 4x speedup on large input documents.
> * ]] within a CDATA section no longer causes input to be discarded.
> * Remove bogus newline after printing children of the root element.
> * Allow the noscript element anywhere, the same as the script element.
> * Updated to the 2011 edition of the W3C character entity list.
> Additionally:
> Updated license with new home page for tagsoup
> Updated annotations to OSGi annotations
> Added the ability to specify additional features/properties for the parser
> Documented available settings
> Javadoc fixed
> Prepared for different parsers by renaming HtmlParserImpl and adding 
> component properties
> Configuration improved



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-6783) Updates for Commons HTML

2018-05-07 Thread Jason E Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466388#comment-16466388
 ] 

Jason E Bailey commented on SLING-6783:
---

[~olli] do you want to do the release on this?

> Updates for Commons HTML
> 
>
> Key: SLING-6783
> URL: https://issues.apache.org/jira/browse/SLING-6783
> Project: Sling
>  Issue Type: Improvement
>  Components: Commons
>Reporter: Jason E Bailey
>Assignee: Oliver Lietz
>Priority: Minor
> Fix For: Commons HTML 1.0.2
>
> Attachments: sling.patch
>
>
> Following updates:
> Updated tagsoup lib to 1.2.1 which has the following modifications
> * DOCTYPE is now recognized even in lower case.
> * We make sure to buffer the reader, eliminating a long-standing bug that 
> would crash on certain inputs, such as & followed by CR+LF.
> * The HTML scanner's table is precompiled at run time for efficiency, causing 
> a 4x speedup on large input documents.
> * ]] within a CDATA section no longer causes input to be discarded.
> * Remove bogus newline after printing children of the root element.
> * Allow the noscript element anywhere, the same as the script element.
> * Updated to the 2011 edition of the W3C character entity list.
> Additionally:
> Updated license with new home page for tagsoup
> Updated annotations to OSGi annotations
> Added the ability to specify additional features/properties for the parser
> Documented available settings
> Javadoc fixed
> Prepared for different parsers by renaming HtmlParserImpl and adding 
> component properties
> Configuration improved



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-6783) Updates for Commons HTML

2018-05-06 Thread Jason E Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465236#comment-16465236
 ] 

Jason E Bailey commented on SLING-6783:
---

[~olli] When I first encountered issues with this and HTML 5 support I did some 
looking around and discovered that there have been some forks of taglib that 
supported HTML 5. That's an option. Last I checked jsoup was working on a SAX 
interface but I don't know the status of that. 
Changing the API and creating a bridge for that into the re-writer would be 
useful. It might be time as well to take a look at the re-writer and 
potentially do a re-write. Honestly that would be my preferred option, look at 
doing an event based rewriting flow.

> Updates for Commons HTML
> 
>
> Key: SLING-6783
> URL: https://issues.apache.org/jira/browse/SLING-6783
> Project: Sling
>  Issue Type: Improvement
>  Components: Commons
>Reporter: Jason E Bailey
>Assignee: Oliver Lietz
>Priority: Minor
> Fix For: Commons HTML 1.0.2
>
> Attachments: sling.patch
>
>
> Following updates:
> Updated tagsoup lib to 1.2.1 which has the following modifications
> * DOCTYPE is now recognized even in lower case.
> * We make sure to buffer the reader, eliminating a long-standing bug that 
> would crash on certain inputs, such as & followed by CR+LF.
> * The HTML scanner's table is precompiled at run time for efficiency, causing 
> a 4x speedup on large input documents.
> * ]] within a CDATA section no longer causes input to be discarded.
> * Remove bogus newline after printing children of the root element.
> * Allow the noscript element anywhere, the same as the script element.
> * Updated to the 2011 edition of the W3C character entity list.
> Additionally:
> Updated license with new home page for tagsoup
> Updated annotations to OSGi annotations
> Added the ability to specify additional features/properties for the parser
> Documented available settings
> Javadoc fixed
> Prepared for different parsers by renaming HtmlParserImpl and adding 
> component properties
> Configuration improved



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-6783) Updates for Commons HTML

2018-05-04 Thread Oliver Lietz (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463833#comment-16463833
 ] 

Oliver Lietz commented on SLING-6783:
-

[~jebailey], [~klcodanr], I guess we have to change the API of Commons HTML 
(used in Rewriter) and getting rid of SAX API to use a different parser for 
HTML5. I tried to plug in [AttoParser|https://www.attoparser.org] and 
[jsoup|https://jsoup.org] but both do not fit properly. WDYT?

> Updates for Commons HTML
> 
>
> Key: SLING-6783
> URL: https://issues.apache.org/jira/browse/SLING-6783
> Project: Sling
>  Issue Type: Improvement
>  Components: Commons
>Reporter: Jason E Bailey
>Assignee: Oliver Lietz
>Priority: Minor
> Fix For: Commons HTML 1.0.2
>
> Attachments: sling.patch
>
>
> Following updates:
> Updated tagsoup lib to 1.2.1 which has the following modifications
> * DOCTYPE is now recognized even in lower case.
> * We make sure to buffer the reader, eliminating a long-standing bug that 
> would crash on certain inputs, such as & followed by CR+LF.
> * The HTML scanner's table is precompiled at run time for efficiency, causing 
> a 4x speedup on large input documents.
> * ]] within a CDATA section no longer causes input to be discarded.
> * Remove bogus newline after printing children of the root element.
> * Allow the noscript element anywhere, the same as the script element.
> * Updated to the 2011 edition of the W3C character entity list.
> Additionally:
> Updated license with new home page for tagsoup
> Updated annotations to OSGi annotations
> Added the ability to specify additional features/properties for the parser
> Documented available settings
> Javadoc fixed
> Prepared for different parsers by renaming HtmlParserImpl and adding 
> component properties
> Configuration improved



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-6783) Updates for Commons HTML

2018-05-03 Thread Jason E Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462390#comment-16462390
 ] 

Jason E Bailey commented on SLING-6783:
---

We should either support them or at least document what is and isn't supported 
from a features perspective. At this point I would just say documentation, I'm 
much more interested in finding a way to make this html5 compliant then 
features that no one has yet asked for.

> Updates for Commons HTML
> 
>
> Key: SLING-6783
> URL: https://issues.apache.org/jira/browse/SLING-6783
> Project: Sling
>  Issue Type: Improvement
>  Components: Commons
>Reporter: Jason E Bailey
>Assignee: Oliver Lietz
>Priority: Minor
> Fix For: Commons HTML 1.0.2
>
> Attachments: sling.patch
>
>
> Following updates:
> Updated tagsoup lib to 1.2.1 which has the following modifications
> * DOCTYPE is now recognized even in lower case.
> * We make sure to buffer the reader, eliminating a long-standing bug that 
> would crash on certain inputs, such as & followed by CR+LF.
> * The HTML scanner's table is precompiled at run time for efficiency, causing 
> a 4x speedup on large input documents.
> * ]] within a CDATA section no longer causes input to be discarded.
> * Remove bogus newline after printing children of the root element.
> * Allow the noscript element anywhere, the same as the script element.
> * Updated to the 2011 edition of the W3C character entity list.
> Additionally:
> Updated license with new home page for tagsoup
> Updated annotations to OSGi annotations
> Added the ability to specify additional features/properties for the parser
> Documented available settings
> Javadoc fixed
> Prepared for different parsers by renaming HtmlParserImpl and adding 
> component properties
> Configuration improved



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-6783) Updates for Commons HTML

2018-04-02 Thread Oliver Lietz (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422057#comment-16422057
 ] 

Oliver Lietz commented on SLING-6783:
-

Do we want to support additional parser properties beside {{lexical-handler}}?

> Updates for Commons HTML
> 
>
> Key: SLING-6783
> URL: https://issues.apache.org/jira/browse/SLING-6783
> Project: Sling
>  Issue Type: Improvement
>  Components: Commons
>Reporter: Jason E Bailey
>Assignee: Oliver Lietz
>Priority: Minor
> Fix For: Commons HTML 1.0.2
>
> Attachments: sling.patch
>
>
> Following updates:
> Updated tagsoup lib to 1.2.1 which has the following modifications
> * DOCTYPE is now recognized even in lower case.
> * We make sure to buffer the reader, eliminating a long-standing bug that 
> would crash on certain inputs, such as & followed by CR+LF.
> * The HTML scanner's table is precompiled at run time for efficiency, causing 
> a 4x speedup on large input documents.
> * ]] within a CDATA section no longer causes input to be discarded.
> * Remove bogus newline after printing children of the root element.
> * Allow the noscript element anywhere, the same as the script element.
> * Updated to the 2011 edition of the W3C character entity list.
> Additionally:
> Updated license with new home page for tagsoup
> Updated annotations to OSGi annotations
> Added the ability to specify additional features/properties for the parser
> Documented available settings
> Javadoc fixed
> Prepared for different parsers by renaming HtmlParserImpl and adding 
> component properties
> Configuration improved



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-6783) Updates for Commons HTML

2018-04-02 Thread Oliver Lietz (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422055#comment-16422055
 ] 

Oliver Lietz commented on SLING-6783:
-

[~jebailey], fixed the features/properties mess also.

> Updates for Commons HTML
> 
>
> Key: SLING-6783
> URL: https://issues.apache.org/jira/browse/SLING-6783
> Project: Sling
>  Issue Type: Improvement
>  Components: Commons
>Reporter: Jason E Bailey
>Assignee: Oliver Lietz
>Priority: Minor
> Fix For: Commons HTML 1.0.2
>
> Attachments: sling.patch
>
>
> Following updates:
> Updated tagsoup lib to 1.2.1 which has the following modifications
> * DOCTYPE is now recognized even in lower case.
> * We make sure to buffer the reader, eliminating a long-standing bug that 
> would crash on certain inputs, such as & followed by CR+LF.
> * The HTML scanner's table is precompiled at run time for efficiency, causing 
> a 4x speedup on large input documents.
> * ]] within a CDATA section no longer causes input to be discarded.
> * Remove bogus newline after printing children of the root element.
> * Allow the noscript element anywhere, the same as the script element.
> * Updated to the 2011 edition of the W3C character entity list.
> Additionally:
> Updated license with new home page for tagsoup
> Updated annotations to OSGi annotations
> Added the ability to specify additional features/properties for the parser
> Documented available settings
> Javadoc fixed
> Prepared for different parsers by renaming HtmlParserImpl and adding 
> component properties
> Configuration improved



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLING-6783) Updates for Commons HTML

2018-04-01 Thread Oliver Lietz (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421804#comment-16421804
 ] 

Oliver Lietz commented on SLING-6783:
-

[~rombert], [~jebailey] NPE fixed. Please ensure tests are in place when 
changing existing functionality or adding new.

> Updates for Commons HTML
> 
>
> Key: SLING-6783
> URL: https://issues.apache.org/jira/browse/SLING-6783
> Project: Sling
>  Issue Type: Improvement
>  Components: Commons
>Reporter: Jason E Bailey
>Assignee: Oliver Lietz
>Priority: Minor
> Fix For: Commons HTML 1.0.2
>
> Attachments: sling.patch
>
>
> Following updates:
> Updated tagsoup lib to 1.2.1 which has the following modifications
> * DOCTYPE is now recognized even in lower case.
> * We make sure to buffer the reader, eliminating a long-standing bug that 
> would crash on certain inputs, such as & followed by CR+LF.
> * The HTML scanner's table is precompiled at run time for efficiency, causing 
> a 4x speedup on large input documents.
> * ]] within a CDATA section no longer causes input to be discarded.
> * Remove bogus newline after printing children of the root element.
> * Allow the noscript element anywhere, the same as the script element.
> * Updated to the 2011 edition of the W3C character entity list.
> Additionally:
> Updated license with new home page for tagsoup
> Updated annotations to OSGi annotations
> Added the ability to specify additional features/properties for the parser
> Documented available settings



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)