On this note, I did tests on a cluster with 3 nodes for the HDFS tests and everything went fine, I would like to help with this one as much as I can.

I have submitted a pull request [1] with instructions on how to run the HDFS queries.

I also have some ansible scripts that I use to set up quickly hadoop clusters that could help with that installation.

Best regards,
Efi

[1] https://github.com/apache/vxquery/pull/24

On 11/08/2015 10:58 μμ, Michael Carey wrote:
PS - https://github.com/TU-Berlin-DIMA/myriad-toolkit/wiki

On 8/11/15 12:25 PM, Eldon Carman wrote:
Hi Guys,

We have an opportunity to test VXQuery on AWS. I wanted to get feedback the tasks needed to prepare a scale test on AWS. What do we need to create in
AWS and what are the coding tasks we need to complete first?

I think the test would be a great opportunity to test out the Yarn and HDFS code (from GSOC). Also, now that we have a handful of XMark queries working
(also from GSOC), they could be used for the test. XMark includes a XML
generator for a single machine.

What type so scale tests would be good in this environment? Typically we
use Scale-Up and Speed-Up tests.

What would the AWS architecture would the test require? I am new to AWS so
please post your suggestions. One requirement for our test will be to
exceed local memory by five times.

Testing Requirements (suggested):
- AWS Architecture
- XMark Benchmark
- HDFS as data storage
- Data size 5 times local memory for each node (at least for the largest
scale-up test)
- Scale-Up test (how big can we go?)

Here is a list of tasks that I can think of right now...

Coding
  - Finish GSOC projects
  - Update XMark XML generator to work in a cluster environment. (Create
local node data in parallel that is unique across the cluster.)
  - Benchmark scripts for the XMark query tests.

AWS
  - Determine architecture for the test.
  - Scripts/configuration for cluster build out.

Previous tests only used eight local server nodes. The AWS test will test
Apache VXQuery in a cloud environment and could scale to a much larger
cluster size.

Thanks for your feedback.
Preston




Reply via email to