On this note, I did tests on a cluster with 3 nodes for the HDFS tests
and everything went fine, I would like to help with this one as much as
I can.
I have submitted a pull request [1] with instructions on how to run the
HDFS queries.
I also have some ansible scripts that I use to set up quickly hadoop
clusters that could help with that installation.
Best regards,
Efi
[1] https://github.com/apache/vxquery/pull/24
On 11/08/2015 10:58 μμ, Michael Carey wrote:
PS - https://github.com/TU-Berlin-DIMA/myriad-toolkit/wiki
On 8/11/15 12:25 PM, Eldon Carman wrote:
Hi Guys,
We have an opportunity to test VXQuery on AWS. I wanted to get
feedback the
tasks needed to prepare a scale test on AWS. What do we need to
create in
AWS and what are the coding tasks we need to complete first?
I think the test would be a great opportunity to test out the Yarn
and HDFS
code (from GSOC). Also, now that we have a handful of XMark queries
working
(also from GSOC), they could be used for the test. XMark includes a XML
generator for a single machine.
What type so scale tests would be good in this environment? Typically we
use Scale-Up and Speed-Up tests.
What would the AWS architecture would the test require? I am new to
AWS so
please post your suggestions. One requirement for our test will be to
exceed local memory by five times.
Testing Requirements (suggested):
- AWS Architecture
- XMark Benchmark
- HDFS as data storage
- Data size 5 times local memory for each node (at least for the largest
scale-up test)
- Scale-Up test (how big can we go?)
Here is a list of tasks that I can think of right now...
Coding
- Finish GSOC projects
- Update XMark XML generator to work in a cluster environment. (Create
local node data in parallel that is unique across the cluster.)
- Benchmark scripts for the XMark query tests.
AWS
- Determine architecture for the test.
- Scripts/configuration for cluster build out.
Previous tests only used eight local server nodes. The AWS test will
test
Apache VXQuery in a cloud environment and could scale to a much larger
cluster size.
Thanks for your feedback.
Preston