Re: IO IT Patterns: Simplifying data loading

Stephen Sisk Thu, 23 Mar 2017 17:10:03 -0700

thanks, appreciated :)

On Thu, Mar 23, 2017 at 4:59 PM Ted Yu <[email protected]> wrote:


> Looks like you forgot to include JIRA number: BEAM-1799
>
> Cheers
>
> On Thu, Mar 23, 2017 at 4:26 PM, Stephen Sisk <[email protected]>
> wrote:
>
> > hi!
> >
> > I just opened a jira ticket that I wanted to make sure the mailing list
> got
> > a chance to see.
> >
> > The problem is that the current design pattern for doing data loading in
> IO
> > ITs (either writing a small program or using an external tool) is
> complex,
> > inefficient and requires extra steps like installing external
> > tools/probably using a VM. It also really doesn't scale well to the
> larger
> > data sizes we'd like to use for performance benchmarking.
> >
> > My proposal is that instead of trying to test read and write separately,
> > the test should be a "write, then read back what you just wrote", all
> using
> > the IO being tested. To support scenarios like "I want to run my read
> test
> > repeatedly without re-writing the data", tests would add flags for
> > "skipCleanUp" and "useExistingData".
> >
> > I think we've all likely seen this type of solution when testing storage
> > layers in the past, and I've previously shied away from it in this
> context,
> > but I think now that I've seen some real ITs and thought about scaling
> > them, in this case it's the right solution.
> >
> > Please take a look at the jira if you have questions - there's a lot more
> > detail there.
> >
> > S
> >
>

Re: IO IT Patterns: Simplifying data loading

Reply via email to