[jira] [Commented] (SOLR-14013) javabin performance regressions

Yonik Seeley (Jira) Sun, 08 Dec 2019 15:03:27 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16991014#comment-16991014
 ]


Yonik Seeley commented on SOLR-14013:
-------------------------------------

I worked up a quick-n-dirty patch to disable the charseq optimization stuff to 
test my hypothesis on slower indexing speed:
{code}
git diff
diff --git 
a/solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java 
b/solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java
index 69da3948fe9..620fffb1303 100644
--- a/solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java
+++ b/solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java
@@ -146,7 +146,7 @@ public class HttpShardHandler extends ShardHandler {
   private static final BinaryResponseParser READ_STR_AS_CHARSEQ_PARSER = new 
BinaryResponseParser() {
     @Override
     protected JavaBinCodec createCodec() {
-      return new JavaBinCodec(null, stringCache).setReadStringAsCharSeq(true);
+      return new JavaBinCodec(null, stringCache).setReadStringAsCharSeq(false);
     }
   };

diff --git a/solr/core/src/java/org/apache/solr/response/DocsStreamer.java 
b/solr/core/src/java/org/apache/solr/response/DocsStreamer.java
index 3d1976e143c..056dc08d963 100644
--- a/solr/core/src/java/org/apache/solr/response/DocsStreamer.java
+++ b/solr/core/src/java/org/apache/solr/response/DocsStreamer.java
@@ -148,9 +148,7 @@ public class DocsStreamer implements Iterator<SolrDocument> 
{
     // because that doesn't include extra fields needed by transformers
     final Set<String> fieldNamesNeeded = fields.getLuceneFieldNames();

-    final SolrDocument out = ResultContext.READASBYTES.get() == null ?
-        new SolrDocument() :
-        new BinaryResponseWriter.MaskCharSeqSolrDocument();
+    final SolrDocument out = new SolrDocument();

     // NOTE: it would be tempting to try and optimize this to loop over 
fieldNamesNeeded
     // when it's smaller then the IndexableField[] in the Document -- but 
that's actually *less* effecient
diff --git 
a/solr/solrj/src/java/org/apache/solr/common/util/ByteArrayUtf8CharSequence.java
 
b/solr/solrj/src/java/org/apache/solr/common/util/ByteArrayUtf8CharSequence.java
index 7a4abe2c303..53cfbee320f 100644
--- 
a/solr/solrj/src/java/org/apache/solr/common/util/ByteArrayUtf8CharSequence.java
+++ 
b/solr/solrj/src/java/org/apache/solr/common/util/ByteArrayUtf8CharSequence.java
@@ -209,8 +209,11 @@ public class ByteArrayUtf8CharSequence implements 
Utf8CharSequence {
     }
     return vals;
   }
-
   public static Object convertCharSeq(Object o) {
+    return o; // nocommit
+  }
+
+  public static Object _convertCharSeq(Object o) {
     if (o == null) return null;
     if (o instanceof Utf8CharSequence) return ((Utf8CharSequence) 
o).toString();
     if (o instanceof Collection) return convertCharSeq((Collection) o);
{code}

I also hacked up the unit test I used to find the N^2 issue...
it's obviously not good for benchmarking (being a unit test, etc), but good 
enough to detect anything major.
I tested with a single value per string field (and many fields per doc).. it 
would be worse for multiple values per field.

Results:
===================== master, single valued string fields
 [junit4] 2> INDEX TIME=10293
 [junit4] 2> QUERY TIME=891 xml
 [junit4] 2> QUERY TIME=415 javabin
 [junit4] 2> QUERY TIME=600 json

 [junit4] 2> INDEX TIME=10313
 [junit4] 2> QUERY TIME=872 xml
 [junit4] 2> QUERY TIME=389 javabin
 [junit4] 2> QUERY TIME=579 json

 [junit4] 2> INDEX TIME=10307
 [junit4] 2> QUERY TIME=858 xml
 [junit4] 2> QUERY TIME=410 javabin
 [junit4] 2> QUERY TIME=570 json

 [junit4] 2> INDEX TIME=10318
 [junit4] 2> QUERY TIME=915 xml
 [junit4] 2> QUERY TIME=382 javabin
 [junit4] 2> QUERY TIME=600 json

 [junit4] 2> INDEX TIME=10579
 [junit4] 2> QUERY TIME=843 xml
 [junit4] 2> QUERY TIME=386 javabin
 [junit4] 2> QUERY TIME=570 json

===================== patch disabling charseq stuff, single valued string fields
   [junit4]   2> INDEX TIME=8547
   [junit4]   2> QUERY TIME=881 xml
   [junit4]   2> QUERY TIME=396 javabin
   [junit4]   2> QUERY TIME=576 json

   [junit4]   2> INDEX TIME=9428
   [junit4]   2> QUERY TIME=821 xml
   [junit4]   2> QUERY TIME=374 javabin
   [junit4]   2> QUERY TIME=543 json

   [junit4]   2> INDEX TIME=9181
   [junit4]   2> QUERY TIME=812 xml
   [junit4]   2> QUERY TIME=382 javabin
   [junit4]   2> QUERY TIME=533 json

   [junit4]   2> INDEX TIME=9455
   [junit4]   2> QUERY TIME=863 xml
   [junit4]   2> QUERY TIME=395 javabin
   [junit4]   2> QUERY TIME=613 json

   [junit4]   2> INDEX TIME=9530
   [junit4]   2> QUERY TIME=863 xml
   [junit4]   2> QUERY TIME=385 javabin
   [junit4]   2> QUERY TIME=559 json

So the charseq stuff (or rather probably the extra work to 
auto-convert-to-string) did cause slower indexing speed.
There is enough noise that I don't think one can draw any conclusions about 
query speed.





> javabin performance regressions
> -------------------------------
>
>                 Key: SOLR-14013
>                 URL: https://issues.apache.org/jira/browse/SOLR-14013
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 7.7
>            Reporter: Yonik Seeley
>            Assignee: Yonik Seeley
>            Priority: Major
>         Attachments: test.json
>
>
> As noted by [~rrockenbaugh] in SOLR-13963, javabin also recently became 
> orders of magnitude slower in certain cases since v7.7.  The cases identified 
> so far include large numbers of values in a field.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-14013) javabin performance regressions

Reply via email to