Re: copying file into hdfs

James Seigel Sat, 10 Apr 2010 15:40:48 -0700

Maybe copy your hdfs config here and we can see why it took up 16 gigsof space.


Cheers


Sent from my mobile. Please excuse the typos.

On 2010-04-10, at 3:22 PM, "Michael Segel" <[email protected]>wrote:

Mike,
First, you need to see what you set your block size to in Hadoop. Bydefault its 64MB. With large files, you may want to bump that up to128 MB per block.
2GB file will give you roughly 20 m/r jobs.

I'd use hadoop fs -copyFromLocal <local file name> <hdfs file name>.
(Ok, I'm going from memory on the hadoop command, but you can alwaysdo a hadoop help to see the command.)
Also you need to see what you set for your replication factor.Usually its 3.
The your 2GB file will be roughly 6GB in size and should be balancedon all of the nodes with 2 or 3 blocks per machine.
HTH

-Mike
Date: Sat, 10 Apr 2010 14:03:02 -0400
Subject: copying file into hdfs
From: [email protected]
To: [email protected]

Hi,

Im mike,
I am a new user of Hadoop. currently, I have a cluster of 8machines and a
file of size 2 gigs.
When I load it into hdfs using command
hadoop dfs -put /a.dat /data
It actually loads it on all data nodes. dfsadmin -report shows hdfsusage to
16 gigs. And it is taking 2 hours to load that data file.

with 1 node - my mapreduce operation on this data took 150 seconds.

So when I used my mapred operation on this cluster.. it is taking 220
seconds for same file.
Can some one please tell me How to distribute this file over 8nodes - sothat each of them will have roughly 300 mbs of file chunk and themapreduce
operation that I have wrote to work in parallel? Isn't hadoop cluster
supposed to be working in parallel?

best.
_________________________________________________________________
The New Busy think 9 to 5 is a cute idea. Combine multiple calendarswith Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5

Re: copying file into hdfs

Reply via email to