[
https://issues.apache.org/jira/browse/SOLR-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781496#comment-15781496
]
Joel Bernstein edited comment on SOLR-9636 at 12/27/16 11:32 PM:
-----------------------------------------------------------------
I finally had the time to test out the javabin writer with the /export handler
and streaming stack. My initial findings are really good. Here is a summary:
1) Currently testing must be done on branch_6x. There is a bug in master which
breaks the /export handler. I haven't gotten to the bottom yet but I'm pretty
sure it was introduced with the new docValues iterator API which is only in
master. I will open a ticket for this bug shortly and see if I can fix the
problem.
But testing in branch_6x is better anyway as it won't be testing both the
docValues iterator API performance at the same time as the javabin /export
performance.
2) For my test I worked on a single Solr instance with a single data shard
(collection1) loaded with 10,000,000 small documents. I also created a worker
collection with 5 shards (collection2). The I ran the following expression with
and without the javabin writer.
{code}
parallel(collection2, workers=5, sort="test_s desc",
rollup(over="test_s", sum(price_f),
search(collection1,
q=*:*,
fl="test_s, price_f",
sort="test_s desc",
qt="/export",
wt="javabin",
partitionKeys=test_s)))
{code}
Notice that there are five parallel workers (collection2) partitioning the
stream from a single data shard (collection1). This is how you achieve maximum
throughput from a single node.
3) Throughput numbers will fairly impressive with this example:
* With json writer: 900,000 Tuples per second.
* With javabin writer: 1,100,000 Tuples per second.
So Javabin gives a significant throughput boost. It's also nice to have an
example of 1 million+ documents per second from a single node.
4) Javabin also produced a much smaller output, roughly 50% smaller then json.
5) I also reviewed the code and looks really nice. Big improvement as far
cleaning up the integration with Solr.
6) The core export/sort algorithm looked to be untouched, which was nice
because there was a lot of hardening on that in the past. My biggest concern
going into this ticket was that refactoring would cause a change in the
export/sort algorithm and we'd have go through the hardening all over again.
But that was not the case.
was (Author: joel.bernstein):
I finally had the time to test out the javabin writer with the /export handler
and streaming stack. My initial findings are really good. Here is a summary:
1) Currently testing must be done on branch_6x. There is a bug in master which
breaks the /export handler. I haven't gotten to the bottom yet but I'm pretty
sure it was introduced with the new docValues iterator API which is only in
master. I will open a ticket for this bug shortly and see if I can fix the
problem.
But testing in branch_6x is better anyway as it won't be testing both the
docValues iterator API performance at the same time as the javabin /export
performance.
2) For my test I worked on a single Solr instance with a single data shard
loaded with 10,000,000 small documents. I also created a worker collection with
5 shards. The I ran the following expression with and without the javabin
writer.
{code}
parallel(collection2, workers=5, sort="test_s desc",
rollup(over="test_s", sum(price_f),
search(collection1,
q=*:*,
fl="test_s, price_f",
sort="test_s desc",
qt="/export",
wt="javabin",
partitionKeys=test_s)))
{code}
> Add support for javabin for /stream, /sql internode communication
> -----------------------------------------------------------------
>
> Key: SOLR-9636
> URL: https://issues.apache.org/jira/browse/SOLR-9636
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Noble Paul
> Assignee: Noble Paul
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-9636.patch
>
>
> currently it uses json, which is verbose and slow
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]