[jira] [Comment Edited] (SOLR-9684) Add schedule Streaming Expression

Joel Bernstein (JIRA) Sun, 01 Jan 2017 18:58:27 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15792043#comment-15792043
 ]


Joel Bernstein edited comment on SOLR-9684 at 1/2/17 2:57 AM:
--------------------------------------------------------------

Ok, then let's go with *priority* as the name for this function.

About the *merge* function. The merge function is shorthand for "mergeSort". 
It's designed to merge two streams sorted on the same keys and maintain the 
sort order. Originally the idea was that the /export handler was a giant 
sorting engine, and merge was a way to efficiently merge the sorted streams.

The priority function behaves more like the SQL UNIONALL. But it's different in 
that *priority* only picks one stream to iterate on each open/close. This 
design allows it to iterate the high priority topic, and only iterate the lower 
priority topic when no new higher priority tasks have entered the index. 
Because topics work in small batches, new high priority tasks will jump ahead 
of existing lower priority tasks on each executor run.

Also the *merge* function I think fits into the relational algebra category. 
The *priority* function is mainly going to be used for task prioritization and 
execution.

Eventually we'll need to implement both a UnionStream and UnionAllStream as 
well.




was (Author: joel.bernstein):
Ok, then let's go with *priority* as the name for this function.

About the *merge* function. The merge function is shorthand for "mergeSort". 
It's designed to merge two streams sorted on the same keys and maintain the 
sort order. Originally the idea was that the /export handler was a giant 
sorting engine, and merge was a way to efficiently merge the sorted streams.

The priority function behaves more like the SQL UNIONALL. But it's different in 
that *priority* only picks one stream to iterate on each open/close. This 
design allows it to iterate the high priority topic, and only iterate the lower 
priority topic when no new higher priority tasks have entered the index. 
Because topics work in small batches, new high priority tasks will jump ahead 
of existing lower priority task on the next executor run.

Also the *merge* function I think fits into the relational algebra category. 
The *priority* function is mainly going to be used for task prioritization and 
execution.

Eventually we'll need to implement both a UnionStream and UnionAllStream as 
well.



> Add schedule Streaming Expression
> ---------------------------------
>
>                 Key: SOLR-9684
>                 URL: https://issues.apache.org/jira/browse/SOLR-9684
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>             Fix For: master (7.0), 6.4
>
>         Attachments: SOLR-9684.patch, SOLR-9684.patch, SOLR-9684.patch
>
>
> SOLR-9559 adds a general purpose *parallel task executor* for streaming 
> expressions. The executor() function executes a stream of tasks and doesn't 
> have any concept of task priority.
> The scheduler() function wraps two streams, a high priority stream and a low 
> priority stream. The scheduler function emits tuples from the high priority 
> stream first, and then the low priority stream.
> The executor() function can then wrap the scheduler function to see tasks in 
> priority order.
> Pseudo syntax:
> {code}
> daemon(executor(schedule(topic(tasks, q="priority:high"), topic(tasks, 
> q="priority:low"))))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SOLR-9684) Add schedule Streaming Expression

Reply via email to