Re: [DISCUSS] Establishing a contrib repo for upgrade testing

Sean Busbey Fri, 06 Mar 2015 12:37:03 -0800

On Fri, Mar 6, 2015 at 12:03 PM, Josh Elser <[email protected]> wrote:


> First off, thanks for the good-will in taking the time to ask.
>
> My biggest concern in adopting it as a codebase would be ensuring that it
> isn't another codebase dropped into contrib/ and subsequently ignored. How
> do you plan to avoid this? Who do you see maintaining and running these
> tests?
>
>
Well, I know I use them when we post candidates. I think it'd be nice if we
all generally got in the habit. Once they've gotten polished up enough to
cut a release we could add it to the e.g. major release procedure. That
would certainly make sure the community stays on it.



> Some more targeted implementation observations/questions -
>
> * Do you plan to update the scripts to work with Apache Accumulo instead
> of CDH specific artifacts? e.g. [1]
>


Yeah, that's part of the vendor-specific-details clean up I mentioned.
FWIW, I've used this for also testing the ASF artifacts and it's worked
fine.



>
> * For the MapReduce job specifically, why did you write your own and not
> use an existing "vetted" job like Continuous Ingest? Is there something
> that the included M/R job does which is not already contained by our CI
> ingest and verify jobs?
>
>
I need to be able to check that none of the data has been corrupted or
lost, and I'd prefer to do it quickly. It's possible for the CI job to have
data corrupted or dropped in a way we can't detect (namely UNREFERENCED
cells).

The data load job is considerably easier to run (esp at scale) than the CI
job. Presuming your cluster is configured correctly, you just use the tool
script and a couple of command line parameters and YARN/MR take care of the
rest. It will also do this across several tables configured with our
different storage options, to make sure we have better coverage.

The given data verify job is also more parallelizable than the existing
jobs, since each executor can handle its share of the cells on the map side
without regard for the others.

For example, from a newly deployed unoptimized cluster I can
launch-and-forget data load + verify and it will get through ~78M cells in
each of 4 tables (for a total of 312M cells) on a low-power 5 node cluster
in around 7 minute load + 2 minute compaction + 2 minute verify without
using offline scans. (and ~2 min of the load time is taking the
two-level-pre-split optimization path when it isn't needed on this small
cluster). It can do more faster on bigger or better tuned clusters, but the
important bit is that I can check correctness by just telling it where
Accumulo + MR is.



> * It looks like the current script only works for 1.4 to 1.6? Do you plan
> to support 1.5->1.6, 1.5->1.7, 1.6->1.7? How do you envision this adoption
> occurring?
>
>
The current script only has comments from a couple of vendor releases. I've
used the overall tooling for ASF releases 1.4 -> 1.5 -> 1.6, 1.4 -> 1.6,
1.5. -> 1.6 and 1.6.0 -> 1.6.1.

For the most part, adding in another target version is just a matter of
checking if the APIs still work. With the adoption of semver, that should
be pretty easy. I have toyed before with adding a shim layer for our API
versions and will probably readdress that once there's a 2.0.

So I think adding those other supported bits will mostly be a matter of
improving the documentation. I'd like to get some ease of use bits
included, like downloading the release or rc tarballs after a prompt for
version numbers. At the very least that documentation part will be a part
of the post-import cleanup.



> * As far as exercising internal Accumulo implementation, I think you have
> the basics covered. What about some more tricky things over the metadata
> table (clone, import, export, merge, split table)? How might additional
> functionality be added in a way that can be automatically tested?
>
>
Those would be great additions. The current compatibility test is limited
to data compatibility. Adding in packages for other api hooks (like that
import/export works across versions) should be just a matter of writing a
driver that talks to the Accumulo api and then updating the automated
script.

At least import/export and clone should be relatively easy, to the extent
that we can leverage the data compatibility tools to put a table in a known
state and then check that other tables match.



> * It seems like you have also targeted a physical set of nodes. Have you
> considered actually using some virtualization platform (e.g. vagrant) to
> fully automate upgrade-testing? If there is a way that a user can spin up a
> few VMs to do the testing, the barrier to entry is much lower (and likely
> more foolproof) than requiring the user to set up the environment.
>
>

To date, our main concern has been testing against live clusters. Mostly
that's an artifact of internal testing procedures. I'd love it if someone
who's proficient in vagrant or docker or whatever could help add a lower
barrier test point.



-- 
Sean

Re: [DISCUSS] Establishing a contrib repo for upgrade testing

Reply via email to