scwhittle commented on pull request #13472:
URL: https://github.com/apache/beam/pull/13472#issuecomment-748958999


   Thanks Alexey! I made this change after generating kafka data and then 
consuming it with a dataflow pipeline showed the watermark was around realtime 
despite events being historical.
   
   After these changes, generating the output and consuming it running the 
pipeline on the dataflow runner had correct watermarks in the past.
   
   I ran the following command to generate kafka data (using dataflow runner)
   `./gradlew :sdks:java:testing:nexmark:run 
-Pnexmark.runner=":runners:google-cloud-dataflow-java" 
-Pnexmark.args="--exportSummaryToBigQuery=false --resourceNameMode=VERBATIM 
--runner=DataflowRunner --project=${PROJECT?:} --region=${REGION?:} 
--gcpTempLocation=${TEMP_LOCATION?:} --streaming=true --manageResources=false 
--monitorJobs=false --sourceType=KAFKA --pubSubMode=PUBLISH_ONLY 
--bootstrapServers=${BOOTSTRAP_SERVER?:} --kafkaTopic=${KAFKA_TOPIC?:} 
--query=0 --sinkType=KAFKA --numEvents=${NUM_EVENTS?:} --numWorkers=20"`
   
   And the following is a sample command to run a query consuming from kafka:
   `./gradlew :sdks:java:testing:nexmark:run 
-Pnexmark.runner=":runners:google-cloud-dataflow-java" 
-Pnexmark.args="--exportSummaryToBigQuery=false 
--resourceNameMode=QUERY_RUNNER_AND_MODE --runner=DataflowRunner 
--streaming=true --manageResources=false --monitorJobs=true --sourceType=KAFKA 
--sinkType=COUNT_ONLY --pubSubMode=SUBSCRIBE_ONLY 
--bootstrapServers=${BOOTSTRAP_SERVER?:} --kafkaTopic=${KAFKA_TOPIC?:} 
--query=1 --project=${PROJECT?:} --tempLocation=${TEMP_LOCATION?:} 
--enableStreamingEngine --cancelStreamingJobAfterFinish 
--numKafkaTopicPartitions=${NUM_PARTITIONS?:} --numWorkers=4 
--autoscalingAlgorithm=NONE"`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to