[GitHub] [beam] amontoli commented on issue #22809: [Bug]: Python SDK gets stuck when using Unbounded PCollection in streaming mode on GroupByKey after ReadFromKafka on DirectRunner, FlinkRunner and DataflowRunner

GitBox Tue, 08 Nov 2022 06:23:14 -0800


amontoli commented on issue #22809:
URL: https://github.com/apache/beam/issues/22809#issuecomment-1307301545


   I have a very similar issue, but instead of using Kafka module in Beam I am 
using the [Kafka module in 
beam_nuggets](http://mohaseeb.com/beam-nuggets/beam_nuggets.io.kafkaio.html), a 
wrapper of the [Kafka Python](https://pypi.org/project/kafka-python/) client. 
With this source I have to add the timestamp by hand using 
beam.window.TimestampedValue.
   
   I tried to analyze the data after applying the window transformation by 
using the AnalyzeElement class defined [here (Example 
2)](https://beam.apache.org/documentation/transforms/python/elementwise/pardo/).
 Data is correctly assigned to a window, but `GroupByKey` never gets called.
   I have used the Direct Runner and the Flink Portable one. I have also tried 
using a non-default trigger (`trigger=trigger.AfterWatermark()`).
   
   I do not know if it is related, but I have also tried with reading from file 
with ReadFromText and the streaming pipeline option: data is processed line by 
line up to the GroupByKey step, but the latter is called only after the whole 
file has been read, as if the window trigger does not activate until the 
PCollection has ended.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] amontoli commented on issue #22809: [Bug]: Python SDK gets stuck when using Unbounded PCollection in streaming mode on GroupByKey after ReadFromKafka on DirectRunner, FlinkRunner and DataflowRunner

Reply via email to