[
https://issues.apache.org/jira/browse/SOLR-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729268#comment-14729268
]
Shalin Shekhar Mangar commented on SOLR-7971:
---------------------------------------------
I added Noble's patch to the JMH tests and I see the following results:
{code}
10 MB JSON
==========
Benchmark Mode Cnt Score
Error Units
JavaBinCodecBenchmark.testDefaultWriteStr thrpt 30 28.846 ±
1.247 ops/s
JavaBinCodecBenchmark.testDirectBufferNoScratchWriteStr thrpt 30 19.113 ±
0.426 ops/s
JavaBinCodecBenchmark.testDirectBufferWriteStr thrpt 30 28.081 ±
0.943 ops/s
JavaBinCodecBenchmark.testDoublePassCountingOutputStream thrpt 30 16.167 ±
0.145 ops/s
JavaBinCodecBenchmark.testDoublePassWriteStr thrpt 30 22.230 ±
0.506 ops/s
JavaBinCodecBenchmark.testDoublePassWriteWithScratchStr thrpt 30 24.608 ±
0.246 ops/s
100MB JSON
===========
Benchmark Mode Cnt Score
Error Units
JavaBinCodecBenchmark.testDefaultWriteStr thrpt 30 2.338 ±
0.163 ops/s
JavaBinCodecBenchmark.testDirectBufferNoScratchWriteStr thrpt 30 1.762 ±
0.088 ops/s
JavaBinCodecBenchmark.testDirectBufferWriteStr thrpt 30 2.934 ±
0.161 ops/s
JavaBinCodecBenchmark.testDoublePassCountingOutputStream thrpt 30 1.613 ±
0.036 ops/s
JavaBinCodecBenchmark.testDoublePassWriteStr thrpt 30 1.510 ±
0.186 ops/s
JavaBinCodecBenchmark.testDoublePassWriteWithScratchStr thrpt 30 2.424 ±
0.079 ops/s
{code}
The CountingNullOutputStream approach is consistently slower than others.
Instead of writing directly to the output stream, using an intermediate scratch
array is much faster.
> Reduce memory allocated by JavaBinCodec to encode large strings
> ---------------------------------------------------------------
>
> Key: SOLR-7971
> URL: https://issues.apache.org/jira/browse/SOLR-7971
> Project: Solr
> Issue Type: Sub-task
> Components: Response Writers, SolrCloud
> Reporter: Shalin Shekhar Mangar
> Assignee: Shalin Shekhar Mangar
> Priority: Minor
> Fix For: Trunk, 5.4
>
> Attachments: SOLR-7971-directbuffer.patch,
> SOLR-7971-directbuffer.patch, SOLR-7971-directbuffer.patch,
> SOLR-7971-doublepass.patch, SOLR-7971-doublepass.patch, SOLR-7971.patch
>
>
> As discussed in SOLR-7927, we can reduce the buffer memory allocated by
> JavaBinCodec while writing large strings.
> https://issues.apache.org/jira/browse/SOLR-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700420#comment-14700420
> {quote}
> The maximum Unicode code point (as of Unicode 8 anyway) is U+10FFFF
> ([http://www.unicode.org/glossary/#code_point]). This is encoded in UTF-16
> as surrogate pair {{\uDBFF\uDFFF}}, which takes up two Java chars, and is
> represented in UTF-8 as the 4-byte sequence {{F4 8F BF BF}}. This is likely
> where the mistaken 4-bytes-per-Java-char formulation came from: the maximum
> number of UTF-8 bytes required to represent a Unicode *code point* is 4.
> The maximum Java char is {{\uFFFF}}, which is represented in UTF-8 as the
> 3-byte sequence {{EF BF BF}}.
> So I think it's safe to switch to using 3 bytes per Java char (the unit of
> measurement returned by {{String.length()}}), like
> {{CompressingStoredFieldsWriter.writeField()}} does.
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]