epugh commented on PR #3674:
URL: https://github.com/apache/solr/pull/3674#issuecomment-3481051804

   > > The PR introduces breaking changes (therefore backporting should 
probably be avoided). Apache Tika 2 and 3 standardized the metadata fields, 
which affect the returned fields.
   > 
   > I tackled that in the `tikaserver` backend by adding a Metadata mapper 
that, if enabled, will map from e.g. `dc.author` to `Author` to please what 
users might have come to expect in Tika1.x. If you intend to pursue some 
upgrade in the 9.x line, re-using that class could perhaps make the upgrade 
somewhat more compatible. But if it is compatible enough to warrant this 
breaking change in 9.x I don't know.
   > 
   > I'd not be opposed to announce that a "necessary" breaking change will 
happen in, say 9.11, due to security risks, and then prepare users for the 
change. I kept the mapping option hidden, un-documented, since I don't want us 
to have to support it. But one could offer a user-supplied map `{"from": "to", 
"from2", "to2"}` where she could tailor this. Or, perhaps that would not be 
needed since we already have the fmap feature able to map fields, e.g. 
`fmap.dc.author=Author`.
   
   I think this is reasonable.   Upgrading 9x to using Tika 2 or 3 is a huge 
effort, and the payoff I don't think is there.   We have a better path forward 
with the new pluggable backends, and that is a better route forward.   
   
   Anyone using Tika needs to anticipate upgrading their codebase anyway for 
Solr 10.
   
   I think documenting these either or both of the alternative approaches is 
fine.   I suspect the vast majority of users of Tika will either NOT upgrade, 
or jump to Solr 10 directly, which is IMO what they should do!     Just the 
fact that we are moving from Tika 1 to Tika 3 means usrs will want to 
revalidate everythign anyway, so they won't be able to easily move Solr 9 
versions anyway, because we all know that Tika 3 is going to handle documents 
slightly differently than Tika 1 did, and users will need to 
test/validate/understand that.   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to