[ 
https://issues.apache.org/jira/browse/SAMZA-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated SAMZA-2:
--------------------------------

    Attachment: SAMZA-2.3.patch

Attaching a new patch.

Per latest comments on RB:

1. Removed the task.chooser.wrapper.class config.
2. Renamed DefaultChooser to WrappedChooser.
3. Force SamzaContainer to always use WrappedChooser.
4. WrappedChooser now uses SystemAdmin map wired in from SamzaContainer, rather 
than instantiating its own SystemAdmins.
5. Removed PriorityChooser.
6. Changed class composition from bootstrap/priority/batch to 
bootstrap/batch/priority.

Regarding (3), [~sriramsub] gives good arguments on the RB on why to do this.

Regarding (5), PriorityChooser was initially created as a helper class to make 
it easy for people to implement their own choosers. It was expected that folks 
would be implementing their own choosers frequently, since we provided little 
functionality out of the box. Given that we're now providing so much rich 
functionality with WrappedChooser, there's not as much need for us to provide 
this class. I'm removing it.

Regarding (6), there's some discussion on the RB about this.

> Fine-grain control over stream consumption
> ------------------------------------------
>
>                 Key: SAMZA-2
>                 URL: https://issues.apache.org/jira/browse/SAMZA-2
>             Project: Samza
>          Issue Type: Bug
>          Components: container
>    Affects Versions: 0.6.0
>            Reporter: Chris Riccomini
>            Assignee: Chris Riccomini
>             Fix For: 0.7.0
>
>         Attachments: SAMZA-2.0.patch, SAMZA-2.1.patch, SAMZA-2.2.patch, 
> SAMZA-2.3.patch
>
>
> Currently, samza exposes configuration in the form of 
> "streams.%s.consumer.max.bytes.per.sec" for throttling the # of bytes the 
> Task will read from a stream. This is a feature request for programmatic 
> fine-grain control over stream consumption. The use-case is a samza task that 
> will be consuming multiple streams where some streams may be from live 
> systems that have stricter SLA requirements and must always be prioritized 
> over other streams that may be from batch systems. The above configuration is 
> not the ideal way to express this type of stream prioritization because 
> configuring the "batch" streams with a low consumption rate will decrease the 
> overall throughput of the system when there is no data in the "live" streams. 
> Furthermore, we'll want to throttle each "batch" stream based on external 
> signals that can change over time. Because of the dynamic nature of these 
> external signals, we would like to have a programmatic interface that can 
> dynamically change the prioritization as the signal changes.
> Design proposal:
> https://wiki.apache.org/samza/Pluggable%20MessageChooser
> Review board:
> https://reviews.apache.org/r/13725/



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to