[
https://issues.apache.org/jira/browse/SOLR-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17648279#comment-17648279
]
Kevin Risden commented on SOLR-16589:
-------------------------------------
Sigh let it sit all day and then Jenkins fails with the following right as I
backport this:
{code:java}
2> 915665 INFO (TEST-LargeFieldTest.test-seed#[511A4AF18155F449]) []
o.a.s.SolrTestCaseJ4 ###Ending test
> org.junit.ComparisonFailure: expected:<ŧ)[]ꢵ ᠯﭚ盕혠> but
was:<ŧ)[#14;]ꢵ ᠯﭚ盕혠>
> at
__randomizedtesting.SeedInfo.seed([511A4AF18155F449:D94E752B2FA999B1]:0)
> at org.junit.Assert.assertEquals(Assert.java:117)
> at org.junit.Assert.assertEquals(Assert.java:146)
> at
org.apache.solr.search.LargeFieldTest.test(LargeFieldTest.java:114)
> at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
> at
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
> at
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
> at
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
> at
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:80)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
> at
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at
org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at
org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
> at
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
> at
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
> at
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
> at
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
> at
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
> at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:80)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at
org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at
org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at
com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
> at java.base/java.lang.Thread.run(Thread.java:829)
2> NOTE: reproduce with: gradlew test --tests LargeFieldTest.test
-Dtests.seed=511A4AF18155F449 -Dtests.multiplier=2 -Dtests.locale=br-FR
-Dtests.timezone=Europe/Malta -Dtests.asserts=true -Dtests.file.encoding=UTF-8
{code}
I'll take a look either later tonight or tomorrow.
> Large fields with large="true" can be truncated when using unicode values
> -------------------------------------------------------------------------
>
> Key: SOLR-16589
> URL: https://issues.apache.org/jira/browse/SOLR-16589
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: search
> Affects Versions: 9.0, 9.1
> Reporter: Nikolas Osvalds
> Assignee: Kevin Risden
> Priority: Major
> Fix For: main (10.0)
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> h3. Summary
> For fields using large="true", large fields (which is what they are intended
> for) can be truncated in v9+ of Solr.
> Example fieldtype definition:
> {code:java}
> <fieldtype name="string_large" class="solr.TextField" multiValued="false"
> indexed="false" stored="true" omitNorms="true" large="true" />{code}
> h3. Cause
> Looks like this is a bug introduced along with
> https://issues.apache.org/jira/browse/LUCENE-8805 /
> https://github.com/apache/lucene/issues/9849
> The current code is here:
> https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/search/SolrDocumentFetcher.java#L511
>
> {code:java}
> public void stringField(FieldInfo fieldInfo, String value) throws IOException
> {
> Objects.requireNonNull(value, "String value should not be null");
> bytesRef.bytes = value.getBytes(StandardCharsets.UTF_8);
> bytesRef.length = value.length();
> {code}
>
> Specifically with respect to "large" fields handling.
> The length in utf8 bytes will often be longer than the string length
> `value.length()`, hence the truncation.
> h3. Fix
> {code:java}
> bytesRef.length = bytesRef.bytes.length {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]