[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849101#comment-17849101 ]
Tim Allison commented on TIKA-4243: ----------------------------------- Fellow devs, in chatting with Nicholas, we're thinking that it would be useful for a number of use cases to overhaul the configuration in tika 3.x. We'd leave in legacy behavior obviously! To move forward, we're thinking about using ParseContext for both initialization and per-parse control in tika-server, tika-pipes and probably tika-app. To do this, serializing ParseContext is really important. Are we ok with adding jackson-annotations to tika-core? We wouldn't add any other jackson to tika-core!!! Alternatively, we could probably write wrappers needed for tika-core objects and put those wrappers in tika-serialization. > tika configuration overhaul > --------------------------- > > Key: TIKA-4243 > URL: https://issues.apache.org/jira/browse/TIKA-4243 > Project: Tika > Issue Type: New Feature > Components: config > Affects Versions: 3.0.0 > Reporter: Nicholas DiPiazza > Priority: Major > > In 3.0.0 when dealing with Tika, it would greatly help to have a Typed > Configuration schema. > In 3.x can we remove the old way of doing configs and replace with Json > Schema? > Json Schema can be converted to Pojos using a maven plugin > [https://github.com/joelittlejohn/jsonschema2pojo] > This automatically creates a Java Pojo model we can use for the configs. > This can allow for the legacy tika-config XML to be read and converted to the > new pojos easily using an XML mapper so that users don't have to use JSON > configurations yet if they do not want. > When complete, configurations can be set as XML, JSON or YAML > tika-config.xml > tika-config.json > tika-config.yaml > Replace all instances of tika config annotations that used the old syntax, > and replace with the Pojo model serialized from the xml/json/yaml. -- This message was sent by Atlassian Jira (v8.20.10#820010)