[ 
https://issues.apache.org/jira/browse/SOLR-15777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gibney updated SOLR-15777:
----------------------------------
    Description: 
{{ICUCollationField}} inherently uses sort-specific values in docValues 
(sorting based on these special-purpose values is the purpose of the field). 
These values differ substantially from input values, and in some cases the 
content of docValues is (appropriately) not even valid UTF8. Despite this, 
{{ICUCollationField}} defaults to {{useDocValuesAsStored=true}}, so if the 
field is not stored and the user requests the field value to be returned 
(either explicitly or implicitly via {{fl=\*}}), at best it returns a 
meaningless String value (potentially causing problems on the client side), at 
worst it can throw a server-side error (see original issue description below).

Original issue title: "UTF8toUTF16 failing for Unicode Character “ᴙ” (U+1D19)"
Original issue description:

This issue was seen for bulgarian language and specifically on the inverse R 
Unicode Character “ᴙ” (U+1D19)

 
 # Indexing documents was fine
 # On querying following error was seen under following conditions

Following is the Solr Config(field type & dynamic field for which the error is 
thrown on querying)
{code:java}
<fieldType name="collated_bg" class="solr.ICUCollationField" locale="bg" 
strength="primary" caseLevel="false"/>{code}
{code:java}
<dynamicField name="sort_X3b_bg_*" type="collated_bg" stored="false" 
indexed="false" docValues="true" />{code}
Following is the sample indexed doc content
{code:java}
{ "id": "testdoc" "sort_X3b_bg_title": "я" }{code}
 

On querying/Select query with id this doc gives the following error on Solr 

 
{code:java}
{ "error":{ "msg":"121", "trace":"java.lang.ArrayIndexOutOfBoundsException: 
121\n\tat 
org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:602)\n\tat 
org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:137)\n\tat 
org.apache.solr.search.SolrDocumentFetcher.decodeDVField(SolrDocumentFetcher.java:550)\n\tat
 
org.apache.solr.search.SolrDocumentFetcher.decorateDocValueFields(SolrDocumentFetcher.java:506)\n\tat
 
org.apache.solr.search.SolrDocumentFetcher$RetrieveFieldsOptimizer.getSolrDoc(SolrDocumentFetcher.java:800)\n\tat
 
org.apache.solr.search.SolrDocumentFetcher$RetrieveFieldsOptimizer.access$000(SolrDocumentFetcher.java:672)\n\tat
 
org.apache.solr.search.SolrDocumentFetcher.solrDoc(SolrDocumentFetcher.java:278)\n\tat
 org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:95)\n\tat 
org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:59)\n\tat 
org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:184)\n\tat
 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:136)\n\tat
 
org.apache.solr.common.util.JsonTextWriter.writeNamedListAsMapWithDups(JsonTextWriter.java:386)\n\tat
 
org.apache.solr.common.util.JsonTextWriter.writeNamedList(JsonTextWriter.java:292)\n\tat
 org.apache.solr.response.JSONWriter.writeResponse(JSONWriter.java:73)\n\tat 
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:66)\n\tat
 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)\n\tat
 
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:811)\n\tat 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:540)\n\tat 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:395)\n\tat
 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:341)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\n\tat 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)\n\tat
 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat
 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
 org.eclipse.jetty.server.Server.handle(Server.java:502)\n\tat 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)\n\tat 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat
 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)\n\tat
 org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat 
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\n\tat 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\n\tat
 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)\n\tat
 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)\n\tat
 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)\n\tat
 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)\n\tat
 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)\n\tat
 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)\n\tat
 java.lang.Thread.run(Thread.java:748)\n", "code":500}}{code}
 

  was:


Original issue title: "UTF8toUTF16 failing for Unicode Character “ᴙ” (U+1D19)"
Original issue description:

This issue was seen for bulgarian language and specifically on the inverse R 
Unicode Character “ᴙ” (U+1D19)

 
 # Indexing documents was fine
 # On querying following error was seen under following conditions

Following is the Solr Config(field type & dynamic field for which the error is 
thrown on querying)
{code:java}
<fieldType name="collated_bg" class="solr.ICUCollationField" locale="bg" 
strength="primary" caseLevel="false"/>{code}
{code:java}
<dynamicField name="sort_X3b_bg_*" type="collated_bg" stored="false" 
indexed="false" docValues="true" />{code}
Following is the sample indexed doc content
{code:java}
{ "id": "testdoc" "sort_X3b_bg_title": "я" }{code}
 

On querying/Select query with id this doc gives the following error on Solr 

 
{code:java}
{ "error":{ "msg":"121", "trace":"java.lang.ArrayIndexOutOfBoundsException: 
121\n\tat 
org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:602)\n\tat 
org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:137)\n\tat 
org.apache.solr.search.SolrDocumentFetcher.decodeDVField(SolrDocumentFetcher.java:550)\n\tat
 
org.apache.solr.search.SolrDocumentFetcher.decorateDocValueFields(SolrDocumentFetcher.java:506)\n\tat
 
org.apache.solr.search.SolrDocumentFetcher$RetrieveFieldsOptimizer.getSolrDoc(SolrDocumentFetcher.java:800)\n\tat
 
org.apache.solr.search.SolrDocumentFetcher$RetrieveFieldsOptimizer.access$000(SolrDocumentFetcher.java:672)\n\tat
 
org.apache.solr.search.SolrDocumentFetcher.solrDoc(SolrDocumentFetcher.java:278)\n\tat
 org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:95)\n\tat 
org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:59)\n\tat 
org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:184)\n\tat
 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:136)\n\tat
 
org.apache.solr.common.util.JsonTextWriter.writeNamedListAsMapWithDups(JsonTextWriter.java:386)\n\tat
 
org.apache.solr.common.util.JsonTextWriter.writeNamedList(JsonTextWriter.java:292)\n\tat
 org.apache.solr.response.JSONWriter.writeResponse(JSONWriter.java:73)\n\tat 
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:66)\n\tat
 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)\n\tat
 
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:811)\n\tat 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:540)\n\tat 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:395)\n\tat
 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:341)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\n\tat 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)\n\tat
 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat
 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
 org.eclipse.jetty.server.Server.handle(Server.java:502)\n\tat 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)\n\tat 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat
 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)\n\tat
 org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat 
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\n\tat 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\n\tat
 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)\n\tat
 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)\n\tat
 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)\n\tat
 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)\n\tat
 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)\n\tat
 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)\n\tat
 java.lang.Thread.run(Thread.java:748)\n", "code":500}}{code}
 


> UTF8toUTF16 failing for Unicode Character “ᴙ” (U+1D19)
> ------------------------------------------------------
>
>                 Key: SOLR-15777
>                 URL: https://issues.apache.org/jira/browse/SOLR-15777
>             Project: Solr
>          Issue Type: Bug
>          Components: query
>    Affects Versions: 7.7.3
>            Reporter: Parag Ninawe
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{ICUCollationField}} inherently uses sort-specific values in docValues 
> (sorting based on these special-purpose values is the purpose of the field). 
> These values differ substantially from input values, and in some cases the 
> content of docValues is (appropriately) not even valid UTF8. Despite this, 
> {{ICUCollationField}} defaults to {{useDocValuesAsStored=true}}, so if the 
> field is not stored and the user requests the field value to be returned 
> (either explicitly or implicitly via {{fl=\*}}), at best it returns a 
> meaningless String value (potentially causing problems on the client side), 
> at worst it can throw a server-side error (see original issue description 
> below).
> Original issue title: "UTF8toUTF16 failing for Unicode Character “ᴙ” (U+1D19)"
> Original issue description:
> This issue was seen for bulgarian language and specifically on the inverse R 
> Unicode Character “ᴙ” (U+1D19)
>  
>  # Indexing documents was fine
>  # On querying following error was seen under following conditions
> Following is the Solr Config(field type & dynamic field for which the error 
> is thrown on querying)
> {code:java}
> <fieldType name="collated_bg" class="solr.ICUCollationField" locale="bg" 
> strength="primary" caseLevel="false"/>{code}
> {code:java}
> <dynamicField name="sort_X3b_bg_*" type="collated_bg" stored="false" 
> indexed="false" docValues="true" />{code}
> Following is the sample indexed doc content
> {code:java}
> { "id": "testdoc" "sort_X3b_bg_title": "я" }{code}
>  
> On querying/Select query with id this doc gives the following error on Solr 
>  
> {code:java}
> { "error":{ "msg":"121", "trace":"java.lang.ArrayIndexOutOfBoundsException: 
> 121\n\tat 
> org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:602)\n\tat 
> org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:137)\n\tat 
> org.apache.solr.search.SolrDocumentFetcher.decodeDVField(SolrDocumentFetcher.java:550)\n\tat
>  
> org.apache.solr.search.SolrDocumentFetcher.decorateDocValueFields(SolrDocumentFetcher.java:506)\n\tat
>  
> org.apache.solr.search.SolrDocumentFetcher$RetrieveFieldsOptimizer.getSolrDoc(SolrDocumentFetcher.java:800)\n\tat
>  
> org.apache.solr.search.SolrDocumentFetcher$RetrieveFieldsOptimizer.access$000(SolrDocumentFetcher.java:672)\n\tat
>  
> org.apache.solr.search.SolrDocumentFetcher.solrDoc(SolrDocumentFetcher.java:278)\n\tat
>  org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:95)\n\tat 
> org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:59)\n\tat 
> org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:184)\n\tat
>  
> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:136)\n\tat
>  
> org.apache.solr.common.util.JsonTextWriter.writeNamedListAsMapWithDups(JsonTextWriter.java:386)\n\tat
>  
> org.apache.solr.common.util.JsonTextWriter.writeNamedList(JsonTextWriter.java:292)\n\tat
>  org.apache.solr.response.JSONWriter.writeResponse(JSONWriter.java:73)\n\tat 
> org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:66)\n\tat
>  
> org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)\n\tat
>  
> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:811)\n\tat
>  org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:540)\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:395)\n\tat
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:341)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
>  
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  org.eclipse.jetty.server.Server.handle(Server.java:502)\n\tat 
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)\n\tat 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat
>  
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)\n\tat
>  org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat 
> org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\n\tat 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\n\tat
>  
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)\n\tat
>  
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)\n\tat
>  
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)\n\tat
>  
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)\n\tat
>  
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)\n\tat
>  
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)\n\tat
>  java.lang.Thread.run(Thread.java:748)\n", "code":500}}{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to