[ 
https://issues.apache.org/jira/browse/BEAM-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841674#comment-16841674
 ] 

Mark Liu commented on BEAM-7339:
--------------------------------

Two concerns for >100Gb input:

1. The resource we have for apache-beam-testing project. We have seen exceeding 
quota in postcommit jobs like cpu and disk. So we should limit number of 
workers in those performance tests. On the other hand, I don't know how long 
does it take to process 100Gb with certain number of workers.
2. Output verification could be hard. Large output may not be fit into Jenkins 
machine so may need special way to verify output correctness. 

> Enable 1Gb input for Python wordcount benchmark
> -----------------------------------------------
>
>                 Key: BEAM-7339
>                 URL: https://issues.apache.org/jira/browse/BEAM-7339
>             Project: Beam
>          Issue Type: Task
>          Components: testing
>            Reporter: Mark Liu
>            Assignee: Mark Liu
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Requirement:
> - Use input from: gs://apache-beam-samples/input_small_files/*
> - Use TestDataflowRunner
> - Limit worker number
> - Disable autoscaling
> - Enable both py2 and py3 benchmarks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to