[ 
https://issues.apache.org/jira/browse/HADOOP-14501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16041163#comment-16041163
 ] 

Andrew Wang commented on HADOOP-14501:
--------------------------------------

FWIW solr uses woodstox, it seems to be second fastest after aalto, and more 
mature:

https://github.com/FasterXML/woodstox
https://stackoverflow.com/a/11782775 (SO answer from author of woodstox and 
aalto)

> aalto-xml cannot handle some odd XML features
> ---------------------------------------------
>
>                 Key: HADOOP-14501
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14501
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: conf
>    Affects Versions: 2.9.0, 3.0.0-alpha4
>            Reporter: Andrew Wang
>            Priority: Blocker
>
> [~hgadre] tried testing solr with a Hadoop 3 client. He saw various test case 
> failures due to what look like functionality gaps in the new aalto-xml stax 
> implementation pulled in by HADOOP-14216:
> {noformat}
>    [junit4]    > Throwable #1: com.fasterxml.aalto.WFCException: Illegal XML 
> character ('ΓΌ' (code 252))
> ....
>    [junit4]    > Caused by: com.fasterxml.aalto.WFCException: General entity 
> reference (&bar;) encountered in entity expanding mode: operation not (yet) 
> implemented
> ...
>    [junit4]    > Throwable #1: org.apache.solr.common.SolrException: General 
> entity reference (&wacky;) encountered in entity expanding mode: operation 
> not (yet) implemented
> {noformat}
> These were from the following test case executions:
> {noformat}
> NOTE: reproduce with: ant test  -Dtestcase=DocumentAnalysisRequestHandlerTest 
> -Dtests.method=testCharsetOutsideDocument -Dtests.seed=2F739D88D9C723CA 
> -Dtests.slow=true -Dtests.locale=und -Dtests.timezone=Atlantic/Faeroe 
> -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
> NOTE: reproduce with: ant test  -Dtestcase=MBeansHandlerTest 
> -Dtests.method=testXMLDiffWithExternalEntity -Dtests.seed=2F739D88D9C723CA 
> -Dtests.slow=true -Dtests.locale=en-US -Dtests.timezone=US/Aleutian 
> -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
> NOTE: reproduce with: ant test  -Dtestcase=XmlUpdateRequestHandlerTest 
> -Dtests.method=testExternalEntities -Dtests.seed=2F739D88D9C723CA 
> -Dtests.slow=true -Dtests.locale=hr -Dtests.timezone=America/Barbados 
> -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
> NOTE: reproduce with: ant test  -Dtestcase=XmlUpdateRequestHandlerTest 
> -Dtests.method=testNamedEntity -Dtests.seed=2F739D88D9C723CA 
> -Dtests.slow=true -Dtests.locale=hr -Dtests.timezone=America/Barbados 
> -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to