[jira] [Comment Edited] (STORM-1443) Support customizing parallelism in StormSQL

Jungtaek Lim (JIRA) Fri, 14 Oct 2016 14:25:28 -0700

    [ 
https://issues.apache.org/jira/browse/STORM-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576529#comment-15576529
 ]


Jungtaek Lim edited comment on STORM-1443 at 10/14/16 9:24 PM:
---------------------------------------------------------------

I feel we're providing all the controls to users and making the thing too 
complicated. I'm not sure the target of Storm SQL users want to tune all of 
components since the goal of Storm SQL is easy to use.
Unless users include CPU intensive UDF, repartition should be the most 
expensive work and should be avoided. Trident optimizes the Trident topology 
logical plan, and while planning it can pack multiple bolts into one. I didn't 
test how Trident makes physical plan if multiple bolts can be packed but having 
different parallelism, but if we optimize more to pack multiple logical tasks 
into one physical task, repartition matters.

So I'd like to start with spout level parallelism which others will follow just 
before aggregation, and revisit if we find a reason to configure each stage.
I have some ideas in mind to specify / determine spout level parallelism, so 
I'll just limit supporting custom parallelism hint to only data source producer 
(spout).


was (Author: kabhwan):
I feel we're providing all the controls to users and making the thing too 
complicated. I'm not sure the target of Storm SQL users want to tune all of 
components since the goal of Storm SQL is easy to use.
Unless users include CPU intensive UDF, repartition should be the most 
expensive work and should be avoided. Trident optimizes the Trident topology 
logical plan, and while planning it can pack multiple bolts into one. I didn't 
test how Trident makes physical plan if multiple bolts can be packed but having 
different parallelism, but if we optimize more to pack multiple logical tasks 
into one physical task, repartition matters.

So I'd like to start with spout level parallelism which others will follow just 
before aggregation, and revisit if we find a reason to configure each stage.
I have some ideas in mind to specify / determine spout level parallelism, so 
I'll close this for now and open a new issue.

> Support customizing parallelism in StormSQL
> -------------------------------------------
>
>                 Key: STORM-1443
>                 URL: https://issues.apache.org/jira/browse/STORM-1443
>             Project: Apache Storm
>          Issue Type: New Feature
>          Components: storm-sql
>            Reporter: Haohui Mai
>
> Currently all processors in StormSQL have a default parallelism of 1. It is 
> desirable to have the ability to set parallelism for each processor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (STORM-1443) Support customizing parallelism in StormSQL

Reply via email to