Re: IO IT Patterns: Simplifying data loading

Chamikara Jayalath Tue, 28 Mar 2017 22:27:37 -0700

On Tue, Mar 28, 2017 at 3:00 AM Etienne Chauchot <echauc...@gmail.com>
wrote:


> Hi Stephen,
>
> I have some comments bellow:
>
>
> Le 24/03/2017 à 00:26, Stephen Sisk a écrit :
> > hi!
> >
> > I just opened a jira ticket that I wanted to make sure the mailing list
> got
> > a chance to see.
> >
> > The problem is that the current design pattern for doing data loading in
> IO
> > ITs (either writing a small program or using an external tool) is
> complex,
> > inefficient and requires extra steps like installing external
> > tools/probably using a VM. It also really doesn't scale well to the
> larger
> > data sizes we'd like to use for performance benchmarking.
> >
> > My proposal is that instead of trying to test read and write separately,
> > the test should be a "write, then read back what you just wrote", all
> using
> > the IO being tested.
> Sure, joining read and write tests will allow to write less often and
> thus be more efficient. Indeed, instead of writing once for all the read
> test runs and write at each write test run, we will only write at each
> read+write test run. We will also avoid using another writing place.
>

I agree that this is beneficial from a test efficiency perspective but
there is a downside.

I think a failure of this kind of a write+read test could be quite hard to
debug and it might even be hard to develop such a test to be non-flaky
depending on the I/O. For example, for a eventually consistent file-system
such as GCS, a failure of a write+read test could mean any one of following.

* write failed
* read failed
* read was executed prior to write finishing and file system reaching a
consistent state.

At first glance one might think that adding barrier in the middle that
waits for read to be consistent would solve that problem but that will not
be the case if the data source serves requests using multiple replicas
which may be in inconsistent states (which is the case for GCS).

Separate read and write tests with fixed input are much easier to
manage/debug.

So I think we should be careful when converting I/O ITs to do read+write
and probably should only make this a recommendation for I/O ITs that would
not run into issues due to this.

Just my 2 cents.

Thanks,
Cham


> > To support scenarios like "I want to run my read test
> > repeatedly without re-writing the data", tests would add flags for
> > "skipCleanUp" and "useExistingData".
> But this does the assumption of the order of test runs: write test needs
> to have been run before read test can happen. Maybe a little dangerous
> to do this assumption no?
> >
> > I think we've all likely seen this type of solution when testing storage
> > layers in the past, and I've previously shied away from it in this
> context,
> > but I think now that I've seen some real ITs and thought about scaling
> > them, in this case it's the right solution.
> >
> > Please take a look at the jira if you have questions - there's a lot more
> > detail there.
> >
> > S
> >
> Etienne
>

Re: IO IT Patterns: Simplifying data loading

Reply via email to