[jira] [Comment Edited] (SOLR-9559) Add ExecutorStream to execute stored Streaming Expressions

Joel Bernstein (JIRA) Mon, 24 Oct 2016 20:49:11 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15604023#comment-15604023
 ]


Joel Bernstein edited comment on SOLR-9559 at 10/25/16 3:48 AM:
----------------------------------------------------------------

All interesting questions.

I thought about *exec* and *eval* but settled on executor because it really is 
a work queue for streaming expressions. It's a really powerful executor because 
it's parallel on a single node and can be parallelized across a cluster of 
worker nodes by wrapping it in the *parallel* function.

The StreamTask's job is to iterate the stream. All functionality in streaming 
expressions is achieved by iterating the stream. In order for something 
interesting to happen in this scenario you would need to use a stream decorator 
that pushes data somewhere, such as the update() function. The update function 
pushes Tuples to another SolrCloud collection. 

For example the executor could be used to train millions of machine learning 
models and store the models in a SolrCloud collection.

There are three core use cases for this:

1) As part of a scalable framework for developing Actor Model systems 
https://en.wikipedia.org/wiki/Actor_model. This is one of core features of 
Spark. The daemon function can be used build Actors that interact with each 
other through work queues and mail boxes.
2) Massively scalable stored queries and alerts. See the topic function for 
more details on subscribing to a query.
3) A general purpose parallel executor / work queue. 

Error handling currently is just logging errors. But there is a lot we can do 
with error handling as this matures. One of the really nice things about the 
topic() function is that it persists it's checkpoints in a collection. If you 
run a job that uses a topic() and it fails in the middle, you can simply start 
it back up and it picks up where it left off.






was (Author: joel.bernstein):
All interesting questions.

I thought about *exec* and *eval* but settled on executor because it really is 
a work queue for streaming expressions. It's a really powerful executor because 
it's parallel on a single node and can be parallelized across a cluster of 
worker nodes by wrapping it in the *parallel* function.

The StreamTask's job is to iterate the stream. All functionality in streaming 
expressions is achieved by iterating the stream. In order for something 
interesting to happen in this scenario you would need to use a stream decorator 
that pushes data somewhere, such as the update() function. The update function 
pushes Tuples to another SolrCloud collection. 

For example the executor could be used to train millions of machine learning 
models and store the models in a SolrCloud collection.

There are three core use cases for this:

1) As part of a scalable framework for developing Actor Model systems 
https://en.wikipedia.org/wiki/Actor_model. This is one of core features of 
Spark.
2) Massively scalable stored queries and alerts. See the topic function for 
more details on subscribing to a query.
3) A general purpose parallel executor / work queue. 

Error handling currently is just logging errors. But there is a lot we can do 
with error handling as this matures. One of the really nice things about the 
topic() function is that it persists it's checkpoints in a collection. If you 
run a job that uses a topic() and it fails in the middle, you can simply start 
it back up and it picks up where it left off.





> Add ExecutorStream to execute stored Streaming Expressions
> ----------------------------------------------------------
>
>                 Key: SOLR-9559
>                 URL: https://issues.apache.org/jira/browse/SOLR-9559
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>             Fix For: 6.3
>
>         Attachments: SOLR-9559.patch, SOLR-9559.patch, SOLR-9559.patch, 
> SOLR-9559.patch
>
>
> The *ExecutorStream* will wrap a stream which contains Tuples with Streaming 
> Expressions to execute. By default the ExecutorStream will look for the 
> expression in the *expr_s* field in the Tuples.
> The ExecutorStream will have an internal thread pool so expressions can be 
> executed in parallel on a single worker. The ExecutorStream can also be 
> wrapped by the parallel function to partition the Streaming Expressions that 
> need to be executed across a cluster of worker nodes.
> *Sample syntax*:
> {code}
> daemon(executor(threads=10, topic(storedExpressions, fl="expr_s", ...)))
> {code}
> In the example above a *daemon* wraps an *executor* which wraps a *topic* 
> that is reading stored Streaming Expressions. The daemon will call the 
> executor at intervals which will execute all the expressions retrieved by the 
> topic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-9559) Add ExecutorStream to execute stored Streaming Expressions

Reply via email to