We want to distribute processing of text files.. processing of large
machine learning tasks, have a distributed database as we have big amount
of data etc.

The problem is that each VM can have up to 2TB of data (limitation of VM),
and we have 20TB of data. So we have to distribute the processing, the
database etc. But all those data will be in a shared huge central file
system.

We heard about myHadoop, but we are not sure why is that any different from
Hadoop.

If we run hadoop/mapreduce without using HDFS? is that an option?

best,
PA


2012/5/17 Mathias Herberts <mathias.herbe...@gmail.com>

> Hadoop does not perform well with shared storage and vms.
>
> The question should be asked first regarding what you're trying to achieve,
> not about your infra.
> On May 17, 2012 10:39 PM, "Pierre Antoine Du Bois De Naurois" <
> pad...@gmail.com> wrote:
>
> > Hello,
> >
> > We have about 50 VMs and we want to distribute processing across them.
> > However these VMs share a huge data storage system and thus their
> "virtual"
> > HDD are all located in the same computer. Would Hadoop be useful for such
> > configuration? Could we use hadoop without HDFS? so that we can retrieve
> > and store everything in the same storage?
> >
> > Thanks,
> > PA
> >
>

Reply via email to