[ 
https://issues.apache.org/jira/browse/TIKA-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-4550:
------------------------------
    Description: 
In main, we now have the json configuration operative from 
tika-parsers-standard-package through the rest of the build. 

We still have remnants of TikaConfig in tika-app, but tika-server is json only.

What do we want the user experience to be when migrating from 3.x to 4.x?

I think most of the conversions should be fairly straightforward – claude was 
able to do a pretty good job, for example. I worry a bit about the parsers 
configurations.

Mechanically at a high level:

0) Major version cutover/transition. Rely on documentation. Remove all of 
TikaConfig from main now.

1) Same as option 0, but try to backport the json deserialization into 3.x so 
that users can get ready.

2) Keep TikaConfig in main, and make it optional in tika-app and tika-server 
for 4.x

3) Something else?

My preference is for option 0. I worry about the amount of code and effort it 
would take to get 1 and 2 right.

WDYT?

 

Things that would help with the transition:

0) documentation, obviously.

1) Perhaps some code that would convert at least the parsers section for the 
tika-parsers-standard parsers from xml to json.

2) other things?

 

  was:
In main, we now have the json configuration operative from 
tika-parsers-standard-package through the rest of the build. 

We still have remnants of TikaConfig in tika-app, but tika-server is json only.

What do we want the user experience to be when migrating from 3.x to 4.x?

I think most of the conversions should be fairly straightforward – claude was 
able to do a pretty good job, for example. I worry a bit about the parsers 
configurations.

Mechanically at a high level:

0) Abrupt transition. Rely on documentation. Remove all of TikaConfig from main 
now.

1) Same as option 0, but try to backport the json deserialization into 3.x so 
that users can get ready.

2) Keep TikaConfig in main, and make it optional in tika-app and tika-server 
for 4.x

3) Something else?

My preference is for option 0. I worry about the amount of code and effort it 
would take to get 1 and 2 right.

WDYT?

 

Things that would help with the transition:

0) documentation, obviously.

1) Perhaps some code that would convert at least the parsers section for the 
tika-parsers-standard parsers from xml to json.

2) other things?

 


> Determine migration path for xml->json configuration in 4.x
> -----------------------------------------------------------
>
>                 Key: TIKA-4550
>                 URL: https://issues.apache.org/jira/browse/TIKA-4550
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>
> In main, we now have the json configuration operative from 
> tika-parsers-standard-package through the rest of the build. 
> We still have remnants of TikaConfig in tika-app, but tika-server is json 
> only.
> What do we want the user experience to be when migrating from 3.x to 4.x?
> I think most of the conversions should be fairly straightforward – claude was 
> able to do a pretty good job, for example. I worry a bit about the parsers 
> configurations.
> Mechanically at a high level:
> 0) Major version cutover/transition. Rely on documentation. Remove all of 
> TikaConfig from main now.
> 1) Same as option 0, but try to backport the json deserialization into 3.x so 
> that users can get ready.
> 2) Keep TikaConfig in main, and make it optional in tika-app and tika-server 
> for 4.x
> 3) Something else?
> My preference is for option 0. I worry about the amount of code and effort it 
> would take to get 1 and 2 right.
> WDYT?
>  
> Things that would help with the transition:
> 0) documentation, obviously.
> 1) Perhaps some code that would convert at least the parsers section for the 
> tika-parsers-standard parsers from xml to json.
> 2) other things?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to