[ 
https://issues.apache.org/jira/browse/TIKA-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17445155#comment-17445155
 ] 

ASF GitHub Bot commented on TIKA-3551:
--------------------------------------

sebastian-nagel commented on pull request #452:
URL: https://github.com/apache/tika/pull/452#issuecomment-971560076


   yes, it duplicates the fix in 3206388 - in case it's of interest: this PR 
includes a unit test. If not, I'll close the PR soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> TikaConfig: unspecified attribute of "xml-reader-utils" breaks configuration 
> file parser
> ----------------------------------------------------------------------------------------
>
>                 Key: TIKA-3551
>                 URL: https://issues.apache.org/jira/browse/TIKA-3551
>             Project: Tika
>          Issue Type: Bug
>          Components: config
>    Affects Versions: 1.27, 2.1.0
>            Reporter: Sebastian Nagel
>            Priority: Major
>             Fix For: 2.1.1
>
>
> The Tika configuration file parser exits with an exception when the 
> XMLReaderUtils are configured in tika-config.xml without specifying all 
> possible attributes, eg. without the attribute maxEntityExpansions (to use 
> the default value):
> {noformat}
> <xml-reader-utils poolSize="20" />
> {noformat}
> There's a test whether the attribute value is null - however, 
> [getAttribute()|https://docs.oracle.com/javase/8/docs/api/org/w3c/dom/Element.html#getAttribute-java.lang.String-]
>  returns the empty string if the attribute is not present. The empty string 
> then causes a NumberFormatException:
> {noformat}
> 2021-09-14 09:57:12,357 ERROR o.a.n.p.t.TikaParser [main] Problem loading 
> custom Tika configuration from tika-config.xml
> java.lang.NumberFormatException: For input string: ""
>         at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
> ~[?:?]
>         at java.lang.Integer.parseInt(Integer.java:662) ~[?:?]
>         at java.lang.Integer.parseInt(Integer.java:770) ~[?:?]
>         at 
> org.apache.tika.config.TikaConfig.updateXMLReaderUtils(TikaConfig.java:303) 
> ~[tika-core-1.25.jar:1.25]
>         at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:192) 
> ~[tika-core-1.25.jar:1.25]
>         at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:182) 
> ~[tika-core-1.25.jar:1.25]
>         at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:157) 
> ~[tika-core-1.25.jar:1.25]
>         at 
> org.apache.nutch.parse.tika.TikaParser.setConf(TikaParser.java:276) 
> [parse-tika.jar:?]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to