sql internode communication

Joel Bernstein (JIRA) Tue, 27 Dec 2016 15:33:07 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781496#comment-15781496
 ]


Joel Bernstein edited comment on SOLR-9636 at 12/27/16 11:32 PM:
-----------------------------------------------------------------

I finally had the time to test out the javabin writer with the /export handler 
and streaming stack. My initial findings are really good. Here is a summary:

1) Currently testing must be done on branch_6x. There is a bug in master which 
breaks the /export handler. I haven't gotten to the bottom yet but I'm pretty 
sure it was introduced with the new docValues iterator API which is only in 
master. I will open a ticket for this bug shortly and see if I can fix the 
problem.

But testing in branch_6x is better anyway as it won't be testing both the 
docValues iterator API performance at the same time as the javabin /export 
performance.

2) For my test I worked on a single Solr instance with a single data shard 
(collection1) loaded with 10,000,000 small documents. I also created a worker 
collection with 5 shards (collection2). The I ran the following expression with 
and without the javabin writer.
{code}
parallel(collection2, workers=5, sort="test_s desc", 
         rollup(over="test_s", sum(price_f),
                search(collection1, 
                       q=*:*,
                       fl="test_s, price_f", 
                       sort="test_s desc", 
                       qt="/export", 
                       wt="javabin", 
                       partitionKeys=test_s)))
{code}

Notice that there are five parallel workers (collection2) partitioning the 
stream from a single data shard (collection1). This is how you achieve maximum 
throughput from a single node.

3) Throughput numbers will fairly impressive with this example:

* With json writer: 900,000 Tuples per second.
* With javabin writer: 1,100,000 Tuples per second.

So Javabin gives a significant throughput boost. It's also nice to have an 
example of 1 million+ documents per second from a single node.

4) Javabin also produced a much smaller output, roughly 50% smaller then json.

5) I also reviewed the code and looks really nice. Big improvement as far 
cleaning up the integration with Solr. 

6) The core export/sort algorithm looked to be untouched, which was nice 
because there was a lot of hardening on that in the past. My biggest concern 
going into this ticket was that refactoring would cause a change in the 
export/sort algorithm and we'd have go through the hardening all over again. 
But that was not the case.






was (Author: joel.bernstein):
I finally had the time to test out the javabin writer with the /export handler 
and streaming stack. My initial findings are really good. Here is a summary:

1) Currently testing must be done on branch_6x. There is a bug in master which 
breaks the /export handler. I haven't gotten to the bottom yet but I'm pretty 
sure it was introduced with the new docValues iterator API which is only in 
master. I will open a ticket for this bug shortly and see if I can fix the 
problem.

But testing in branch_6x is better anyway as it won't be testing both the 
docValues iterator API performance at the same time as the javabin /export 
performance.

2) For my test I worked on a single Solr instance with a single data shard 
loaded with 10,000,000 small documents. I also created a worker collection with 
5 shards. The I ran the following expression with and without the javabin 
writer.
{code}
parallel(collection2, workers=5, sort="test_s desc", 
         rollup(over="test_s", sum(price_f),
                search(collection1, 
                       q=*:*,
                       fl="test_s, price_f", 
                       sort="test_s desc", 
                       qt="/export", 
                       wt="javabin", 
                       partitionKeys=test_s)))
{code}




> Add support for javabin for /stream, /sql internode communication
> -----------------------------------------------------------------
>
>                 Key: SOLR-9636
>                 URL: https://issues.apache.org/jira/browse/SOLR-9636
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>             Fix For: master (7.0), 6.4
>
>         Attachments: SOLR-9636.patch
>
>
> currently it uses json, which is verbose and slow



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SOLR-9636) Add support for javabin for /stream, /sql internode communication

Reply via email to