[
https://issues.apache.org/jira/browse/TIKA-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774230#comment-17774230
]
Tim Allison edited comment on TIKA-4154 at 10/11/23 8:38 PM:
-------------------------------------------------------------
I don't think your problem in OpenSearch is caused by Tika, but we should
probably offer a similar way for people to configure this value as well in
plain Tika. This will likely bite someone at some point.
I'm not sure what the simplest way to let users configure this is. We use
jackson in emitters, tika-batch, tika-server and tika-eval. I'd rather not have
to track that parameter for each use, and I also am not fond of system
properties.
Not thrilled with this one either, by maybe a static/global variable in
TikaConfig?
was (Author: [email protected]):
I don't think your problem in OpenSearch is caused by Tika, but we should
probably offer a similar way for people to configure this value as well in
plain Tika. This will likely bite someone at some point.
I'm not sure what the simplest way to let users configure this is. We use
jackson in emitters, tika-batch, tika-server and tika-eval. I'd rather not have
to track that parameter for each use, and I also am not fond of system
properties.
Global variable in tika-config?
> Make DEFAULT_MAX_STRING_LEN in StreamReadConstraints configurable
> -----------------------------------------------------------------
>
> Key: TIKA-4154
> URL: https://issues.apache.org/jira/browse/TIKA-4154
> Project: Tika
> Issue Type: Improvement
> Components: core
> Affects Versions: 2.9.0
> Environment: In a Java application running with 8 GB JVM on Ubuntu OS
> Reporter: Vishal Ranjan
> Priority: Critical
>
> In "com.fasterxml.jackson.core", in StreamReadConstraints there is a
> constraint of string length of 20M by DEFAULT_MAX_STRING_LEN variable. To
> handle larger text, we need this value to be a much larger value. We want
> this variable to be made configurable so that we can tweak it as per
> requirement.
> This constraint was added in 2.15.0 release.
> public class StreamReadConstraints
> public static final int DEFAULT_MAX_STRING_LEN = 20_000_000;
> Snippet of exception received:
> com.fasterxml.jackson.core.exc.StreamConstraintsException: String length
> (20054016) exceeds the maximum length (20000000) at
> com.fasterxml.jackson.core.StreamReadConstraints.validateStringLength(StreamReadConstraints.java:324)
> at
> com.fasterxml.jackson.core.util.ReadConstrainedTextBuffer.validateStringLength(ReadConstrainedTextBuffer.java:27)
> at
> com.fasterxml.jackson.core.util.TextBuffer.finishCurrentSegment(TextBuffer.java:939)
> at
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2584)
> at
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString(UTF8StreamJsonParser.java:2529)
> at
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getTextCharacters(UTF8StreamJsonParser.java:487)
> at
> com.fasterxml.jackson.core.JsonGenerator._copyCurrentStringValue(JsonGenerator.java:2777)
> at
> com.fasterxml.jackson.core.JsonGenerator._copyCurrentContents(JsonGenerator.java:2668)
> at
> com.fasterxml.jackson.core.JsonGenerator.copyCurrentStructure(JsonGenerator.java:2619)
> at
--
This message was sent by Atlassian Jira
(v8.20.10#820010)