Re: Unbalanced processing of KeyedStream

Ken Krugler Wed, 02 Jan 2019 08:29:32 -0800

FWIW, if you want exactly one record per operator, then this code 
<https://github.com/ScaleUnlimited/flink-crawler/blob/ba06aa87226b4c44e30aba6df68984d53cc15519/src/main/java/com/scaleunlimited/flinkcrawler/utils/FlinkUtils.java#L153>
 should generate key values that will be partitioned properly.


— Ken

> On Jan 2, 2019, at 12:16 AM, Jozef Vilcek <[email protected]> wrote:
> 
> Hello,
> 
> I am facing a problem where KeyedStream is purely parallelised on workers
> for case where number of keys is close to parallelism.
> 
> Some workers process zero keys, some more than one. This is because of
> `KeyGroupRangeAssignment.assignKeyToParallelOperator()` in
> `KeyGroupStreamPartitioner` as I found out in this post:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Use-keyBy-to-deterministically-hash-each-record-to-a-processor-task-slot-td16483.html
> 
> I would like to find out what are my options here.
> * is there a reason why custom partitioner can not be used in keyed stream?
> * can there be an API support for creating keys correct KeyedStream
> compatible keys? It would also make
> `DataStreamUtils.reinterpretAsKeyedStream()` more useable for certain
> scenarios.
> * any other option I have?
> 
> Many thanks in advance.
> 
> Best,
> Jozef

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra

Re: Unbalanced processing of KeyedStream

Reply via email to