[ 
https://issues.apache.org/jira/browse/BEAM-7589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887847#comment-16887847
 ] 

Brachi Packter commented on BEAM-7589:
--------------------------------------

[~aromanenko] Still happens, can't run Data Flow with the latest snapshot.



getting abstract method error: (Can it be related to this PR 
[https://github.com/apache/beam/pull/8311]?)

Adding again the stack trace:
{code:java}
java.lang.AbstractMethodError: 
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.schemaElement(I)Ljava/lang/Object;
 
org.apache.beam.sdk.schemas.transforms.Convert$ConvertTransform$1$DoFnInvoker.invokeProcessElement(Unknown
 Source) 
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:213)
 
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:178)
 
org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:330)
 
org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:44)
 
org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:49)
 
org.apache.beam.runners.dataflow.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:276){code}

> Kinesis IO.write throws LimitExceededException
> ----------------------------------------------
>
>                 Key: BEAM-7589
>                 URL: https://issues.apache.org/jira/browse/BEAM-7589
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-kinesis
>    Affects Versions: 2.11.0
>            Reporter: Anton Kedin
>            Assignee: Alexey Romanenko
>            Priority: Major
>             Fix For: 2.15.0
>
>          Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Follow up from https://issues.apache.org/jira/browse/BEAM-7357:
>  
> ----
> Brachi Packter added a comment - 13/Jun/19 09:05
>  [~aromanenko] I think I find what makes the shard map update now.
> You create a producer per bundle (in SetUp function) and if I multiply it by 
> the number of workers, this gives huge amount of producers, I belive this 
> make the "update shard map" call.
> If I copy your code and create *one* producer ** for every wroker, then this 
> error disappear.
> Can you just remove the producer creation from setUp method, and move it to 
> some static field in the class, that created once the class is initiated.
> See similar issue that was with JDBCIO, connection pool was created per setup 
> method, and we moved it to be a static member, and then we will have one pool 
> for JVM. ask [~iemejia] for more detail.
> ----
> Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
>   
>  [~brachi_packter] What kind of error do you have in this case? Could you 
> post an error stacktrace / exception message? 
>  Also, it would be helpful (if it's possible) if you could provide more 
> details about your environment and pipeline, like what is your pipeline 
> topology, which runner do you use, number of workers in your cluster, etc. 
>  For now, I can't reproduce it on my side, so all additional info will be 
> helpful.
> ----
> Brachi Packter added a comment - 16/Jun/19 06:44
>  I get same Same error:
> {code:java}
> [0x00001728][0x00007f13ed4c4700] [error] [shard_map.cc:150] Shard map update 
> for stream "**" failed. Code: LimitExceededException Message: Rate exceeded 
> for stream poc-test under account **.; retrying in 5062 ms
> {code}
> I'm not seeing full stack trace, but can see in log also this:
> {code:java}
> [2019-06-13 08:29:09.427018] [0x000007e1][0x00007f8d508d3700] [warning] [AWS 
> Log: WARN](AWSErrorMarshaller)Encountered AWSError Throttling Rate exceeded
> {code}
> More details:
>  I'm using DataFlow runner, java SDK 2.11.
> 60 workers initally, (with auto scalling and also with flag 
> "enableStreamingEngine")
> Normally, I'm producing 4-5k per second, but when I have latency, this can be 
> even multiply by 3-4 times.
> When I'm starting the DataFlow job I have latency, so I produce more data, 
> and I fail immediately.
> Also, I have consumers, 3rd party tool, I know that they call describe stream 
> each 30 seconds.
> My job pipeline, running on GCP, reading data from PubSub, it read around 
> 20,000 record per second (in regular time, and in latency time even 100,000 
> records per second) , it does many aggregation and counting base on some 
> diamnesions (Using Beam sql) , This is done for 1 minutes window slide, and 
> wrting the result of aggregations to Kinesis stream.
> My stream has 10 shards, and my partition key logic is generating UUid per 
> each record: 
> UUID.randomUUID().toString()
> Hope this gave you some more context on my problem.
> Another suggestion I have, can you try fix the issue as I suggest and provide 
> me some specific version for testing? without merging it to master? (I would 
> di it myself, but I had truobles building locally the hue repository of 
> apache beam..)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to