boyuanzz commented on pull request #14538: URL: https://github.com/apache/beam/pull/14538#issuecomment-820650099
> > What's the purpose to have BigTableRead run against a larger data set as well? > > It is more suitable to be used as benchmark for performance tracking purpose. Current 1K row read finishes in a few seconds. I think it would be nice to separate E2E integration tests from benchmark/load tests. We usually want to run a small data set for E2E integration test to make it finish as soon as possible. Having read size as pipeline options can help us invoke the same test for different purpose. For example, most of performance jobs: https://github.com/apache/beam/tree/master/.test-infra/jenkins#performance-jobs are configurable for the input size. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
