[
https://issues.apache.org/jira/browse/SOLR-15777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17452461#comment-17452461
]
Michael Gibney commented on SOLR-15777:
---------------------------------------
Not sure I'll have the time to dig deeper into this in the near term. If this
is a pressing issue for you (as your comment about "51 sites" suggests it may
be), I wonder if you might be able to address this issue with configuration
changes?
I trust someone will correct me if I'm off the mark here, but here's my
thinking at the moment:
It looks like you're trying to return the raw binary collation key values to
the client, and Solr is fetching the binary/processed keys from docValues
(where they're used internally for sorting) and treating these binary values as
if they are valid UTF8, which they are (appropriately, iiuc?) not. The only
reason I can think of to return these raw collation keys to the client is if
you're going to use them for client-side sorting, which strikes me as a very
specialized use case. It's a plausible use case I guess, but in that case you'd
want/need to get the field value as raw binary (probably base64-encoded in the
Json response writer).
I think it would make sense for Solr to support the above "binary sort key" use
case (which it evidently currently does not, and errors out unhelpfully). But
going out on a limb here: it seems more likely that either:
# You are being returned this value by default (e.g., {{fl=*}},
{{useDocValuesAsStored=true}}), and you actually don't care about it, and could
probably explicitly add {{useDocValuesAsStored=false}} to the field or
fieldType definition and call it a day, or
# You want the original input field value for display, in which case it should
suffice for you to add {{stored=true}} to the field or fieldType definition
(this would require a full reindex to have the desired effect).
Again, I think there's probably ideally work to be done here on the Solr side,
but there's a good chance that you can mitigate this issue by using existing
configuration options (with quicker turnaround, incidentally).
> UTF8toUTF16 failing for Unicode Character “ᴙ” (U+1D19)
> ------------------------------------------------------
>
> Key: SOLR-15777
> URL: https://issues.apache.org/jira/browse/SOLR-15777
> Project: Solr
> Issue Type: Bug
> Components: query
> Affects Versions: 7.7.3
> Reporter: Parag Ninawe
> Priority: Major
>
> This issue was seen for bulgarian language and specifically on the inverse R
> Unicode Character “ᴙ” (U+1D19)
>
> # Indexing documents was fine
> # On querying following error was seen under following conditions
> Following is the Solr Config(field type & dynamic field for which the error
> is thrown on querying)
> {code:java}
> <fieldType name="collated_bg" class="solr.ICUCollationField" locale="bg"
> strength="primary" caseLevel="false"/>{code}
> {code:java}
> <dynamicField name="sort_X3b_bg_*" type="collated_bg" stored="false"
> indexed="false" docValues="true" />{code}
> Following is the sample indexed doc content
> {code:java}
> { "id": "testdoc" "sort_X3b_bg_title": "я" }{code}
>
> On querying/Select query with id this doc gives the following error on Solr
>
> {code:java}
> { "error":{ "msg":"121", "trace":"java.lang.ArrayIndexOutOfBoundsException:
> 121\n\tat
> org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:602)\n\tat
> org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:137)\n\tat
> org.apache.solr.search.SolrDocumentFetcher.decodeDVField(SolrDocumentFetcher.java:550)\n\tat
>
> org.apache.solr.search.SolrDocumentFetcher.decorateDocValueFields(SolrDocumentFetcher.java:506)\n\tat
>
> org.apache.solr.search.SolrDocumentFetcher$RetrieveFieldsOptimizer.getSolrDoc(SolrDocumentFetcher.java:800)\n\tat
>
> org.apache.solr.search.SolrDocumentFetcher$RetrieveFieldsOptimizer.access$000(SolrDocumentFetcher.java:672)\n\tat
>
> org.apache.solr.search.SolrDocumentFetcher.solrDoc(SolrDocumentFetcher.java:278)\n\tat
> org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:95)\n\tat
> org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:59)\n\tat
> org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:184)\n\tat
>
> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:136)\n\tat
>
> org.apache.solr.common.util.JsonTextWriter.writeNamedListAsMapWithDups(JsonTextWriter.java:386)\n\tat
>
> org.apache.solr.common.util.JsonTextWriter.writeNamedList(JsonTextWriter.java:292)\n\tat
> org.apache.solr.response.JSONWriter.writeResponse(JSONWriter.java:73)\n\tat
> org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:66)\n\tat
>
> org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)\n\tat
>
> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:811)\n\tat
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:540)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:395)\n\tat
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:341)\n\tat
>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\n\tat
>
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
>
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
>
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\n\tat
>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)\n\tat
>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
> org.eclipse.jetty.server.Server.handle(Server.java:502)\n\tat
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)\n\tat
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat
>
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)\n\tat
> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat
> org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\n\tat
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\n\tat
>
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)\n\tat
>
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)\n\tat
>
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)\n\tat
>
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)\n\tat
>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)\n\tat
>
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)\n\tat
> java.lang.Thread.run(Thread.java:748)\n", "code":500}}{code}
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]