[ 
https://issues.apache.org/jira/browse/SOLR-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729268#comment-14729268
 ] 

Shalin Shekhar Mangar commented on SOLR-7971:
---------------------------------------------

I added Noble's patch to the JMH tests and I see the following results:
{code}
10 MB JSON
==========
Benchmark                                                  Mode  Cnt   Score   
Error  Units
JavaBinCodecBenchmark.testDefaultWriteStr                 thrpt   30  28.846 ± 
1.247  ops/s
JavaBinCodecBenchmark.testDirectBufferNoScratchWriteStr   thrpt   30  19.113 ± 
0.426  ops/s
JavaBinCodecBenchmark.testDirectBufferWriteStr            thrpt   30  28.081 ± 
0.943  ops/s
JavaBinCodecBenchmark.testDoublePassCountingOutputStream  thrpt   30  16.167 ± 
0.145  ops/s
JavaBinCodecBenchmark.testDoublePassWriteStr              thrpt   30  22.230 ± 
0.506  ops/s
JavaBinCodecBenchmark.testDoublePassWriteWithScratchStr   thrpt   30  24.608 ± 
0.246  ops/s

100MB JSON
===========
Benchmark                                                  Mode  Cnt  Score   
Error  Units
JavaBinCodecBenchmark.testDefaultWriteStr                 thrpt   30  2.338 ± 
0.163  ops/s
JavaBinCodecBenchmark.testDirectBufferNoScratchWriteStr   thrpt   30  1.762 ± 
0.088  ops/s
JavaBinCodecBenchmark.testDirectBufferWriteStr            thrpt   30  2.934 ± 
0.161  ops/s
JavaBinCodecBenchmark.testDoublePassCountingOutputStream  thrpt   30  1.613 ± 
0.036  ops/s
JavaBinCodecBenchmark.testDoublePassWriteStr              thrpt   30  1.510 ± 
0.186  ops/s
JavaBinCodecBenchmark.testDoublePassWriteWithScratchStr   thrpt   30  2.424 ± 
0.079  ops/s
{code}

The CountingNullOutputStream approach is consistently slower than others. 
Instead of writing directly to the output stream, using an intermediate scratch 
array is much faster.

> Reduce memory allocated by JavaBinCodec to encode large strings
> ---------------------------------------------------------------
>
>                 Key: SOLR-7971
>                 URL: https://issues.apache.org/jira/browse/SOLR-7971
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Response Writers, SolrCloud
>            Reporter: Shalin Shekhar Mangar
>            Assignee: Shalin Shekhar Mangar
>            Priority: Minor
>             Fix For: Trunk, 5.4
>
>         Attachments: SOLR-7971-directbuffer.patch, 
> SOLR-7971-directbuffer.patch, SOLR-7971-directbuffer.patch, 
> SOLR-7971-doublepass.patch, SOLR-7971-doublepass.patch, SOLR-7971.patch
>
>
> As discussed in SOLR-7927, we can reduce the buffer memory allocated by 
> JavaBinCodec while writing large strings.
> https://issues.apache.org/jira/browse/SOLR-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700420#comment-14700420
> {quote}
> The maximum Unicode code point (as of Unicode 8 anyway) is U+10FFFF 
> ([http://www.unicode.org/glossary/#code_point]).  This is encoded in UTF-16 
> as surrogate pair {{\uDBFF\uDFFF}}, which takes up two Java chars, and is 
> represented in UTF-8 as the 4-byte sequence {{F4 8F BF BF}}.  This is likely 
> where the mistaken 4-bytes-per-Java-char formulation came from: the maximum 
> number of UTF-8 bytes required to represent a Unicode *code point* is 4.
> The maximum Java char is {{\uFFFF}}, which is represented in UTF-8 as the 
> 3-byte sequence {{EF BF BF}}.
> So I think it's safe to switch to using 3 bytes per Java char (the unit of 
> measurement returned by {{String.length()}}), like 
> {{CompressingStoredFieldsWriter.writeField()}} does.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to