The distributed tests are self contained except for letting the application
know (1) where to find Elasticsearch and (2) setting spark.home for the
SparkLauncher. For MapReduce distributed testing that reads from hdfs, it
should work out-of-the-box on a correctly configured Hadoop cluster running
YARN.

For Elasticseach, you need to set 'es.nodes' and 'es.port' in the
pirk.properties file packaged in the jar. This is actually a bug in that it
should be able to be set from a local properties file -- I have entered a
JIRA and will fix it shortly.

For Spark, the the DistributedTestSuite using SparkLaucher which needs to
know where to find the bin directory containing the spark-submit script.
The 'spark.home' property in the pirk.properties file defines this
location. Typically, it is '/usr', but (FYI) I have needed to put the full
path to the 'original' install location in the Cloudera distribution that I
have been using lately.

I will update the testing webpage to reflect the properties that need to be
set for the various distributed tests. Thanks for pointing this out!

As time allows, I can add primers for AWS and GCP (specifically).



On Tue, Aug 23, 2016 at 9:19 AM, Tim Ellison <[email protected]> wrote:

> On 19/08/16 14:20, Ellison Anne Williams wrote:
> > Also, AWS and GCP are options - they won't integrate into travis, but
> they
> > are a (relatively) easy way to run through the distributed test suite.
> >
> > I had thought about posting some instructions (at some point) for 'How to
> > Run the Pirk Distributed Tests on AWS/GCP/etc' to help new folks get up
> to
> > speed quickly. Of course, AWS and GCP both have detailed instructions,
> but
> > they take time to wade through. Would that be helpful?
>
> I've had a play with AWS and got a Spark cluster defined and started --
> but our instructions for running Pirk distributed tests [1] don't really
> give enough information on how to send it work to do.
>
> It'll take me a while to figure it out from the code, so if you can
> share any properties etc that would be helpful.
>
> [1] http://pirk.incubator.apache.org/for_developers#testing
>
> Thanks,
> Tim
>
>
> > On Fri, Aug 19, 2016 at 9:09 AM, Darin Johnson <[email protected]>
> > wrote:
> >
> >> I've built full integration tests with hadoop-minicluster before.
> They're
> >> a pain to setup but aren't bad to maintain once done and could be
> >> integrated into travis-ci.
> >>
> >> On Fri, Aug 19, 2016 at 9:02 AM, Tim Ellison <[email protected]>
> >> wrote:
> >>
> >>> On 18/08/16 17:12, Ellison Anne Williams wrote:
> >>>> As a friendly public service announcement - please make sure that you
> >> run
> >>>> the distributed test suite before you accept a PR (or at least, before
> >>>> accepting a PR that touches anything affecting the tests).
> >>>
> >>> Mea culpa.
> >>>
> >>> My usual working practice is:
> >>>  - hack, hack, hack
> >>>  - run mvn clean test locally
> >>>  - commit to new local branch
> >>>  - push to my github fork
> >>>  - wait until Travis declares it tested ok
> >>>  - open the PR, expect the PR to pass the Travis checks
> >>>
> >>> Now I agree that I should also be doing the distributed tests; and even
> >>> more so as I work my way up the Pirk stack into the distributed code.
> >>>
> >>> What I really want is the equivalent of a Travis check for the stuff
> I'm
> >>> doing, and the PRs I'm reviewing.  Any thoughts about how we can
> achieve
> >>> that as I try to figure out how I can run the distributed tests?
> >>>
> >>> Regards,
> >>> Tim
> >>>
> >>>
> >>>
> >>
> >
>

Reply via email to