First of all, let me say I don't use EC2 - there's some people at my company who do, but I've been fortunate enough to use our internal dev cluster for all the work I've done, so this is total hearsay.

That having been said, the people that I know who are using EC2 aren't leaving the cluster running when not in use - there's scripts from (I believe) Cloudera that can allocate and configure the right number of nodes on EC2 with whatever AMI you specify, and then tear them down when you're done.

On 11/5/09 1:14 PM, Mark Kerzner wrote:
Edmund,

I wanted to install OpenOffice and connect to it from my java code. I tried
to replicate the complete install by copying it, but there must be something
else there, because I can't connect on Amazon MapReduce, but I can on my own
cluster.

When you say cheaper, do you mean that keeping your own EC2 machines up and
using them as hadoop cluster is in the end cheaper than starting a Hadoop
cluster every time you want to run a job?

Thank you,
Mark

On Thu, Nov 5, 2009 at 12:04 PM, Edmund Kohlwey<ekohl...@gmail.com>  wrote:

If all your dependencies are java based (like Open Office) you might try
using a dependency manager/build tool like maven or ant/ivy to package the
dependencies in your jar. I'm not sure if any parts of open office are
available in a public repo as maven artifacts or not, or if you want to get
into packaging artifacts for your build system, but its something you might
try.

I think its cheaper to just use EC2 anyways, so that might be a motivating
factor for you as well.

  Hi,
so far I've been using Amazon MapReduce. However, my app uses a growing
number of Linux packages. I have been installing them on the fly, in the
Mapper.configure(), but with OpenOffice this is hard, and I don't get a
service connection even after local install.

Therefore, my question is on the advice in creating my own Hadoop cluster
out of EC2 machines. Are there instructions? How hard is it? What are
best
practices?

Thank you,
Mark




Reply via email to