[
https://issues.apache.org/jira/browse/TIKA-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444660#comment-17444660
]
ASF GitHub Bot commented on TIKA-3551:
--------------------------------------
tballison commented on pull request #452:
URL: https://github.com/apache/tika/pull/452#issuecomment-970465384
I think this is fixed now and this can be closed? If not, please let me
know.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> TikaConfig: unspecified attribute of "xml-reader-utils" breaks configuration
> file parser
> ----------------------------------------------------------------------------------------
>
> Key: TIKA-3551
> URL: https://issues.apache.org/jira/browse/TIKA-3551
> Project: Tika
> Issue Type: Bug
> Components: config
> Affects Versions: 1.27, 2.1.0
> Reporter: Sebastian Nagel
> Priority: Major
> Fix For: 2.1.1
>
>
> The Tika configuration file parser exits with an exception when the
> XMLReaderUtils are configured in tika-config.xml without specifying all
> possible attributes, eg. without the attribute maxEntityExpansions (to use
> the default value):
> {noformat}
> <xml-reader-utils poolSize="20" />
> {noformat}
> There's a test whether the attribute value is null - however,
> [getAttribute()|https://docs.oracle.com/javase/8/docs/api/org/w3c/dom/Element.html#getAttribute-java.lang.String-]
> returns the empty string if the attribute is not present. The empty string
> then causes a NumberFormatException:
> {noformat}
> 2021-09-14 09:57:12,357 ERROR o.a.n.p.t.TikaParser [main] Problem loading
> custom Tika configuration from tika-config.xml
> java.lang.NumberFormatException: For input string: ""
> at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> ~[?:?]
> at java.lang.Integer.parseInt(Integer.java:662) ~[?:?]
> at java.lang.Integer.parseInt(Integer.java:770) ~[?:?]
> at
> org.apache.tika.config.TikaConfig.updateXMLReaderUtils(TikaConfig.java:303)
> ~[tika-core-1.25.jar:1.25]
> at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:192)
> ~[tika-core-1.25.jar:1.25]
> at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:182)
> ~[tika-core-1.25.jar:1.25]
> at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:157)
> ~[tika-core-1.25.jar:1.25]
> at
> org.apache.nutch.parse.tika.TikaParser.setConf(TikaParser.java:276)
> [parse-tika.jar:?]
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)