Running on multiple CPU's

2007-04-16 Thread Eelco Lempsink
Hi, Inspired by http://www.mail-archive.com/[EMAIL PROTECTED]/ msg02394.html I'm trying to run Hadoop on multiple CPU's, but without using HDFS. In my hadoop-site.xml I have the following options (in XML-format of course): mapred.job.tracker = localhost:50099 mapred.map.tasks = 3

Re: Running on multiple CPU's

2007-04-16 Thread Doug Cutting
Eelco Lempsink wrote: Inspired by http://www.mail-archive.com/[EMAIL PROTECTED]/msg02394.html I'm trying to run Hadoop on multiple CPU's, but without using HDFS. To be clear: you need some sort of shared filesystem, if not HDFS, then NFS, S3, or something else. For example, the job client

Re: Running on multiple CPU's

2007-04-16 Thread jafarim
Sorry if being off topic, but we experienced a very low bandwidth with hadoop while copying files to/from the cluster (some 1/100 comparing to plain samba share). The bandwidth did not improve at all by adding nodes to the cluster. At that time I thought that hadoop is not supposed to be used for

Re: Running on multiple CPU's

2007-04-16 Thread Ken Krugler
At 9:41 am -0700 4/16/07, Doug Cutting wrote: Eelco Lempsink wrote: Inspired by http://www.mail-archive.com/[EMAIL PROTECTED]/msg02394.html I'm trying to run Hadoop on multiple CPU's, but without using HDFS. To be clear: you need some sort of shared filesystem, if not HDFS, then NFS, S3, or

bandwidth (Was: Re: Running on multiple CPU's)

2007-04-16 Thread Doug Cutting
Please use a new subject when starting a new topic. jafarim wrote: Sorry if being off topic, but we experienced a very low bandwidth with hadoop while copying files to/from the cluster (some 1/100 comparing to plain samba share). The bandwidth did not improve at all by adding nodes to the

Re: Running on multiple CPU's

2007-04-16 Thread Doug Cutting
Ken Krugler wrote: Has anybody been using Hadoop with ZFS? Would ZFS count as a readily available shared file system that scales appropriately? Sun's ZFS? I don't think that's distributed, is it? Does it provide a single namespace across an arbitrarily large cluster? From the

Re: bandwidth (Was: Re: Running on multiple CPU's)

2007-04-16 Thread jafarim
On linux and jvm6 with normal IDE disks and a giga ethernet switch with corresponding NIC and with hadoop 0.9.11's HDFS. We wrote a C program by using the native libs provided in the package but then we tested again with distcp. The scenario was as follows: We ran the test on a cluster with 1

Re: bandwidth (Was: Re: Running on multiple CPU's)

2007-04-16 Thread Michael Bieniosek
What are you trying to do? Hadoop dfs has different goals than a network file system such as samba. -Michael On 4/16/07 10:32 AM, jafarim [EMAIL PROTECTED] wrote: On linux and jvm6 with normal IDE disks and a giga ethernet switch with corresponding NIC and with hadoop 0.9.11's HDFS. We wrote

Re: bandwidth (Was: Re: Running on multiple CPU's)

2007-04-16 Thread Doug Cutting
jafarim wrote: On linux and jvm6 with normal IDE disks and a giga ethernet switch with corresponding NIC and with hadoop 0.9.11's HDFS. We wrote a C program by using the native libs provided in the package but then we tested again with distcp. The scenario was as follows: We ran the test on a

Re: Running on multiple CPU's

2007-04-16 Thread Ken Krugler
Ken Krugler wrote: Has anybody been using Hadoop with ZFS? Would ZFS count as a readily available shared file system that scales appropriately? Sun's ZFS? I don't think that's distributed, is it? Does it provide a single namespace across an arbitrarily large cluster? From the documentation