[ 
https://issues.apache.org/jira/browse/SOLR-13255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16769161#comment-16769161
 ] 

Jan Høydahl edited comment on SOLR-13255 at 2/15/19 10:12 AM:
--------------------------------------------------------------

Attached a raw, not tested patch for langid for branch_7_7.

Due to a refactor, the bug will be different in 8.x, probably it will just 
silently fail to detect any languages, since the list of String fields are 
determined through instanceof String. The patch for 8.x and master will thus 
need to fix SolrInputDocumentReader instead.

I think that for 8.0 we should add an UPGRADE NOTE about this breaking change...


was (Author: janhoy):
Attached a raw, not tested patch for langid for branch_7_7. Due to a refactor, 
the patch will be different for master and 8x, where we'll need to fix 
SolrInputDocumentReader instead, which also does instanced String.

I think that for 8.0 we should add an UPGRADE NOTE about this breaking change...

> LanguageIdentifierUpdateProcessor broken for documents sent with SolrJ/javabin
> ------------------------------------------------------------------------------
>
>                 Key: SOLR-13255
>                 URL: https://issues.apache.org/jira/browse/SOLR-13255
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: contrib - LangId
>    Affects Versions: 7.7
>            Reporter: Andreas Hubold
>            Priority: Major
>             Fix For: 8.0, 7.7.1
>
>         Attachments: SOLR-13255.patch
>
>
> 7.7 changed the object type of string field values that are passed to 
> UpdateRequestProcessor implementations from java.lang.String to 
> ByteArrayUtf8CharSequence. SOLR-12992 was mentioned on solr-user as cause.
> The LangDetectLanguageIdentifierUpdateProcessor still expects String values, 
> does not work for CharSequences, and logs warnings instead. For example:
> {noformat}
> 2019-02-14 13:14:47.537 WARN  (qtp802600647-19) [   x:studio] 
> o.a.s.u.p.LangDetectLanguageIdentifierUpdateProcessor Field name_tokenized 
> not a String value, not including in detection
> {noformat}
> I'm not sure, but there could be further places where the changed type for 
> string values needs to be handled. (Our custom UpdateRequestProcessor are 
> broken as well since 7.7 and it would be great to have a proper upgrade note 
> as part of the release notes)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to