[
https://issues.apache.org/jira/browse/BEAM-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841674#comment-16841674
]
Mark Liu commented on BEAM-7339:
--------------------------------
Two concerns for >100Gb input:
1. The resource we have for apache-beam-testing project. We have seen exceeding
quota in postcommit jobs like cpu and disk. So we should limit number of
workers in those performance tests. On the other hand, I don't know how long
does it take to process 100Gb with certain number of workers.
2. Output verification could be hard. Large output may not be fit into Jenkins
machine so may need special way to verify output correctness.
> Enable 1Gb input for Python wordcount benchmark
> -----------------------------------------------
>
> Key: BEAM-7339
> URL: https://issues.apache.org/jira/browse/BEAM-7339
> Project: Beam
> Issue Type: Task
> Components: testing
> Reporter: Mark Liu
> Assignee: Mark Liu
> Priority: Major
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Requirement:
> - Use input from: gs://apache-beam-samples/input_small_files/*
> - Use TestDataflowRunner
> - Limit worker number
> - Disable autoscaling
> - Enable both py2 and py3 benchmarks
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)