Encoding problem.

2009-04-16 Thread Edward J. Yoon
Hi, I wanted to read the data in EUC-KR format using UTF-8, so I set a up a JVM parameter -Dfile.encoding=EUC-KR in the HADOOP_OPTS. But, it did not work. Is there any other method than coding my own input format? -- Best Regards, Edward J. Yoon edwardy...@apache.org http://blog.udanax.org

Re: Encoding problem.

2009-04-16 Thread Edward J. Yoon
My typos, using TextInputFormat (UTF-8) On Thu, Apr 16, 2009 at 4:18 PM, Edward J. Yoon edwardy...@apache.org wrote: Hi, I wanted to read the data in EUC-KR format using UTF-8, so I set a up a JVM parameter -Dfile.encoding=EUC-KR in the HADOOP_OPTS. But, it did not work. Is there any other

No space left on device Exception

2009-04-16 Thread Rakhi Khatwani
Hi, I am running a map-reduce program on 6-Node ec2 cluster. and after a couple of hours all my tasks gets hanged. so i started digging into the logs there were no logs for regionserver no logs for tasktracker. However for jobtracker i get the following: 2009-04-16 03:00:29,691 INFO

RE: No space left on device Exception

2009-04-16 Thread Desai, Milind B
From the exception it appears that there is no space left on machine. You can check using 'df' Thanks Milind -Original Message- From: Rakhi Khatwani [mailto:rakhi.khatw...@gmail.com] Sent: Thursday, April 16, 2009 1:15 PM To: hbase-u...@hadoop.apache.org; core-user@hadoop.apache.org

Re: No space left on device Exception

2009-04-16 Thread Rakhi Khatwani
Hi, following is the output on the df command [r...@domu-12-31-39-00-e5-d2 conf]# df -h FilesystemSize Used Avail Use% Mounted on /dev/sda1 9.9G 4.2G 5.2G 45% / /dev/sdb 414G 924M 392G 1% /mnt from the o/p it seems that i have quite an amount of

Re: Map-Reduce Slow Down

2009-04-16 Thread Mithila Nagendra
Thanks Jason! Will check that out. Mithila On Thu, Apr 16, 2009 at 5:23 AM, jason hadoop jason.had...@gmail.comwrote: Double check that there is no firewall in place. At one point a bunch of new machines were kickstarted and placed in a cluster and they all failed with something similar. It

Re: No space left on device Exception

2009-04-16 Thread Miles Osborne
it may be that intermediate results are filling your disks and when the jobs crash, this all gets deleted. so it would look like you have spare space when in reality you don't. i would check on the file system as your jobs run and see if indeed they are filling-up. Miles 2009/4/16 Rakhi

Re: No space left on device Exception

2009-04-16 Thread Rakhi Khatwani
Thanks, I will check tht Regards, Raakhi On Thu, Apr 16, 2009 at 1:42 PM, Miles Osborne mi...@inf.ed.ac.uk wrote: it may be that intermediate results are filling your disks and when the jobs crash, this all gets deleted. so it would look like you have spare space when in reality

Re: Generating many small PNGs to Amazon S3 with MapReduce

2009-04-16 Thread tim robertson
Hi Chuck, Thank you very much for this opportunity. I also think it is a nice case study; it goes beyond the typical wordcount example by generating something that people can actually see and play with immediately afterwards (e.g. maps). It is also showcasing nicely the community effort to

Re: Interesting Hadoop/FUSE-DFS access patterns

2009-04-16 Thread Tom White
Not sure if will affect your findings, but when you read from a FSDataInputStream you should see how many bytes were actually read by inspecting the return value and re-read if it was fewer than you want. See Hadoop's IOUtils readFully() method. Tom On Mon, Apr 13, 2009 at 4:22 PM, Brian

Migration

2009-04-16 Thread Rakhi Khatwani
Hi, Incase we migrate from hadoop 0.19.0 and hbase 0.19.0 to hadoop 0.20.0 and hbase 0.20.0 respectively, how would it affect the existing data on hadoop dfs and hbase tables? can we migrate the data using distcp only?? Regards Raakhi

Re: Interesting Hadoop/FUSE-DFS access patterns

2009-04-16 Thread Brian Bockelman
Hey Tom, Yup, that's one of the things I've been looking at - however, it doesn't appear to be the likely culprit as to why data access is fairly random. The time the operation took does not seem to be a factor of the number of bytes read, at least in the smaller range. Brian On Apr

Re: Generating many small PNGs to Amazon S3 with MapReduce

2009-04-16 Thread tim robertson
However, do the math on the costs for S3. We were doing something similar, and found that we were spending a fortune on our put requests at $0.01 per 1000, and next to nothing on storage. I've since moved to a more complicated model where I pack many small items in each object and store an

Hadoop Presentation at Ankara / Turkey

2009-04-16 Thread Enis Soztutar
Hi all, I will be giving a presentation on Hadoop at 1. Ulusal Yüksek Başarım ve Grid Konferansı tomorrow(Apr 17, 13:10). The conference location is at KKM ODTU/Ankara/Turkey. Presentation will be in Turkish. All the Hadoop users and wanna-be users in the area are welcome to attend. More

Re: No space left on device Exception

2009-04-16 Thread Pankil Doshi
Hey what's your input size? from the info you gave it seems you have used 4.2GB and so probably if thats your input size your intermediate results mostly is less then your input.but that too depends on your map function. Make sure about the size of intermediate results. Pankil On Thu, Apr 16,

Re: Complex workflows in Hadoop

2009-04-16 Thread Shevek
On Tue, 2009-04-14 at 07:59 -0500, Pankil Doshi wrote: Hey, I am trying complex queries on hadoop and in which i require more than one job to run to get final result..results of job one captures few joins of the query and I want to pass those results as input to 2nd job and again do

Re: Complex workflows in Hadoop

2009-04-16 Thread jason hadoop
Chaining described in chapter 8 of my book provides this to a limited degree. Cascading, http://www.cascading.org/, also supports complex flows. I do not know how cascading works under the covers. On Thu, Apr 16, 2009 at 8:23 AM, Shevek had...@anarres.org wrote: On Tue, 2009-04-14 at 07:59

Re: Complex workflows in Hadoop

2009-04-16 Thread Vadim Zaliva
Cascading is great. If you looking for more pragmatic approach, which would allow you to build a workflow from existing Hadoop tasks and PIG scripts without writing additional Java code you may want to take a look at HAMAKE: http://code.google.com/p/hamake/ Vadim

RE: Complex workflows in Hadoop

2009-04-16 Thread Brian MacKay
Have you looked at ChainMapper and ChainReducer. It may not be entirely what you require, but with some modifications perhaps it might work for you. Using the ChainMapper and the ChainReducer classes is possible to compose Map/Reduce jobs that look like [MAP+ / REDUCE MAP*]. And immediate

Re: Generating many small PNGs to Amazon S3 with MapReduce

2009-04-16 Thread Todd Lipcon
On Thu, Apr 16, 2009 at 1:27 AM, tim robertson timrobertson...@gmail.comwrote: What is not 100% clear to me is when to push to S3: In the Map I will output the TileId-ZoomLevel-SpeciesId as the key, along with the count, and in the Reduce I group the counts into larger tiles, and create the

Re: getting DiskErrorException during map

2009-04-16 Thread Alex Loddengaard
Have you set hadoop.tmp.dir away from /tmp as well? If hadoop.tmp.dir is set somewhere in /scratch vs. /tmp, then I'm not sure why Hadoop would be writing to /tmp. Hope this helps! Alex On Wed, Apr 15, 2009 at 2:37 PM, Jim Twensky jim.twen...@gmail.com wrote: Alex, Yes, I bounced the

Re: Error reading task output

2009-04-16 Thread Cam Macdonell
Cam Macdonell wrote: Hi, I'm getting the following warning when running the simple wordcount and grep examples. 09/04/15 16:54:16 INFO mapred.JobClient: Task Id : attempt_200904151649_0001_m_19_0, Status : FAILED Too many fetch-failures 09/04/15 16:54:16 WARN mapred.JobClient: Error

Re: getting DiskErrorException during map

2009-04-16 Thread Jim Twensky
Yes, here is how it looks: property namehadoop.tmp.dir/name value/scratch/local/jim/hadoop-${user.name}/value /property so I don't know why it still writes to /tmp. As a temporary workaround, I created a symbolic link from /tmp/hadoop-jim to /scratch/... and it works fine

Question about the classpath setting for bin/hadoop jar

2009-04-16 Thread Cole, Richard
Hi, I noticed that the bin/hadoop jar command doesn't add the jar being executed to the classpath. Is this deliberate and what is the reasoning? The result is that resources in the jar are not accessible from the system class loader. Rather they are only available from the thread context class

hadoop0.18.3 64-bit AMI

2009-04-16 Thread Parul Kudtarkar
Could any one suggest a working 64-bit Hadoop AMI which is publicly available? hadoop0.18.3 64-bit AMI is not publicly available. It would be nice if some one could release it. Thanks, Parul V. Kudtarkar -- View this message in context:

Seattle / PNW Hadoop + Lucene User Group?

2009-04-16 Thread Bradford Stephens
Greetings, Would anybody be willing to join a PNW Hadoop and/or Lucene User Group with me in the Seattle area? I can donate some facilities, etc. -- I also always have topics to speak about :) Cheers, Bradford

Re: Hadoop basic question

2009-04-16 Thread Jim Twensky
http://wiki.apache.org/hadoop/FAQ#7 On Thu, Apr 16, 2009 at 6:52 PM, Jae Joo jaejo...@gmail.com wrote: Will anyone guide me how to avoid the the single point failure of master node. This is what I know. If the master node is donw by some reason, the hadoop system is down and there is no way

Re: Hadoop basic question

2009-04-16 Thread Jeff Hammerbacher
Also see http://files.meetup.com/1228907/Hadoop%20Namenode%20High%20Availability.pptx . On Thu, Apr 16, 2009 at 4:58 PM, Jim Twensky jim.twen...@gmail.com wrote: http://wiki.apache.org/hadoop/FAQ#7 On Thu, Apr 16, 2009 at 6:52 PM, Jae Joo jaejo...@gmail.com wrote: Will anyone guide me how

Sometimes no map tasks are run - X are complete and N-X are pending, none running

2009-04-16 Thread Saptarshi Guha
Hello, I'm using 0.19.2-dev-core (checked out from cvs and build). With 51 maps, i have a case where 50 tasks have completed and 1 is pending, about 1400 records left for this one to process. The completed map taska have written out 18GB to the HDFS. The last map task is forrever in the pending

Re: hadoop0.18.3 64-bit AMI

2009-04-16 Thread Lalit Kapoor
Parul, Do a search for cloudera, they have a 64bit ami avaliable. Also take a look at this: http://www.cloudera.com/hadoop-ec2 you can start up an hadoop cluster quickly and be on your way (good for proofs of concept and one time jobs like they state). Sincerely, Lalit Kapoor On Thu, Apr

Re: Map-Reduce Slow Down

2009-04-16 Thread jason hadoop
The kickstart script was something that the operations staff was using to initialize new machines, I never actually saw the script, just figured out that there was a firewall in place. On Thu, Apr 16, 2009 at 1:28 PM, Mithila Nagendra mnage...@asu.edu wrote: Jason: the kickstart script - was

Re: Map-Reduce Slow Down

2009-04-16 Thread jason hadoop
The firewall was run at system startup, I think there was a /etc/sysconfig/iptables file present which triggered the firewall. I don't currently have access to any centos 5 machines so I can't easily check. On Thu, Apr 16, 2009 at 6:54 PM, jason hadoop jason.had...@gmail.comwrote: The

Re: Map-Reduce Slow Down

2009-04-16 Thread Mithila Nagendra
Thanks! I ll see what I can find out. On Fri, Apr 17, 2009 at 4:55 AM, jason hadoop jason.had...@gmail.comwrote: The firewall was run at system startup, I think there was a /etc/sysconfig/iptables file present which triggered the firewall. I don't currently have access to any centos 5

Re: Error reading task output

2009-04-16 Thread Aaron Kimball
Cam, This isn't Hadoop-specific, it's how Linux treats its network configuration. If you look at /etc/host.conf, you'll probably see a line that says order hosts, bind -- this is telling Linux's DNS resolution library to first read your /etc/hosts file, then check an external DNS server. You

Re: More Replication on dfs

2009-04-16 Thread Aaron Kimball
That setting will instruct future file writes to replicate two-fold. This has no bearing on existing files; replication can be set on a per-file basis, so they already have their replications set in the DFS indivdually. Use the command: bin/hadoop fs -setrep [-R] repl_factor filename... to