Pramod Immaneni created APEXCORE-348:
----------------------------------------

             Summary: Load based stream partitioning
                 Key: APEXCORE-348
                 URL: https://issues.apache.org/jira/browse/APEXCORE-348
             Project: Apache Apex Core
          Issue Type: Improvement
            Reporter: Pramod Immaneni
            Assignee: Pramod Immaneni


There are scenarios where the downstream partitions of an upstream operator are 
generally not performing uniformly resulting in an overall sub-optimal 
performance dictated by the slowest partitions. The reasons could be data 
related such as some partitions are receiving more data to process than the 
others or could be environment related such as some partitions are running 
slower than others because they are on heavily loaded nodes.

A solution based on currently available functionality in the engine would be to 
write a StreamCodec implementation to distribute data among the partitions such 
that each partition is receiving similar amount of data to process. We should 
consider adding StreamCodecs like these to the library but these however do not 
solve the problem when it is environment related.

For that a better and more comprehensive approach would be look at how data is 
being consumed by the downstream partitions from the BufferServer and use that 
information to make decisions on how to send future data. If some partitions 
are behind others in consuming data then data can be directed to the other 
partitions. One way to do this would be to relay this type of statistical and 
positional information from BufferServer to the upstream publishers. The 
publishers can use this information in ways such as making it available to 
StreamCodecs to affect destination of future data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to