Hi,
I wanted to read the data in EUC-KR format using UTF-8, so I set a up
a JVM parameter -Dfile.encoding=EUC-KR in the HADOOP_OPTS. But, it did
not work. Is there any other method than coding my own input format?
--
Best Regards, Edward J. Yoon
edwardy...@apache.org
http://blog.udanax.org
My typos, using TextInputFormat (UTF-8)
On Thu, Apr 16, 2009 at 4:18 PM, Edward J. Yoon edwardy...@apache.org wrote:
Hi,
I wanted to read the data in EUC-KR format using UTF-8, so I set a up
a JVM parameter -Dfile.encoding=EUC-KR in the HADOOP_OPTS. But, it did
not work. Is there any other
Hi,
I am running a map-reduce program on 6-Node ec2 cluster. and after a
couple of hours all my tasks gets hanged.
so i started digging into the logs
there were no logs for regionserver
no logs for tasktracker.
However for jobtracker i get the following:
2009-04-16 03:00:29,691 INFO
From the exception it appears that there is no space left on machine. You can
check using 'df'
Thanks
Milind
-Original Message-
From: Rakhi Khatwani [mailto:rakhi.khatw...@gmail.com]
Sent: Thursday, April 16, 2009 1:15 PM
To: hbase-u...@hadoop.apache.org; core-user@hadoop.apache.org
Hi,
following is the output on the df command
[r...@domu-12-31-39-00-e5-d2 conf]# df -h
FilesystemSize Used Avail Use% Mounted on
/dev/sda1 9.9G 4.2G 5.2G 45% /
/dev/sdb 414G 924M 392G 1% /mnt
from the o/p it seems that i have quite an amount of
Thanks Jason! Will check that out.
Mithila
On Thu, Apr 16, 2009 at 5:23 AM, jason hadoop jason.had...@gmail.comwrote:
Double check that there is no firewall in place.
At one point a bunch of new machines were kickstarted and placed in a
cluster and they all failed with something similar.
It
it may be that intermediate results are filling your disks and when
the jobs crash, this all gets deleted. so it would look like you have
spare space when in reality you don't.
i would check on the file system as your jobs run and see if indeed
they are filling-up.
Miles
2009/4/16 Rakhi
Thanks,
I will check tht
Regards,
Raakhi
On Thu, Apr 16, 2009 at 1:42 PM, Miles Osborne mi...@inf.ed.ac.uk wrote:
it may be that intermediate results are filling your disks and when
the jobs crash, this all gets deleted. so it would look like you have
spare space when in reality
Hi Chuck,
Thank you very much for this opportunity. I also think it is a nice
case study; it goes beyond the typical wordcount example by generating
something that people can actually see and play with immediately
afterwards (e.g. maps). It is also showcasing nicely the community
effort to
Not sure if will affect your findings, but when you read from a
FSDataInputStream you should see how many bytes were actually read by
inspecting the return value and re-read if it was fewer than you want.
See Hadoop's IOUtils readFully() method.
Tom
On Mon, Apr 13, 2009 at 4:22 PM, Brian
Hi,
Incase we migrate from hadoop 0.19.0 and hbase 0.19.0 to hadoop 0.20.0
and hbase 0.20.0 respectively, how would it affect the existing data on
hadoop dfs and hbase tables? can we migrate the data using distcp only??
Regards
Raakhi
Hey Tom,
Yup, that's one of the things I've been looking at - however, it
doesn't appear to be the likely culprit as to why data access is
fairly random. The time the operation took does not seem to be a
factor of the number of bytes read, at least in the smaller range.
Brian
On Apr
However, do the math on the costs for S3. We were doing something similar,
and found that we were spending a fortune on our put requests at $0.01 per
1000, and next to nothing on storage. I've since moved to a more complicated
model where I pack many small items in each object and store an
Hi all,
I will be giving a presentation on Hadoop at 1. Ulusal Yüksek Başarım
ve Grid Konferansı tomorrow(Apr 17, 13:10). The conference location is
at KKM ODTU/Ankara/Turkey. Presentation will be in Turkish. All the
Hadoop users and wanna-be users in the area are welcome to attend.
More
Hey
what's your input size?
from the info you gave it seems you have used 4.2GB and so probably if thats
your input size your intermediate results mostly is less then your input.but
that too depends on your map function. Make sure about the size of
intermediate results.
Pankil
On Thu, Apr 16,
On Tue, 2009-04-14 at 07:59 -0500, Pankil Doshi wrote:
Hey,
I am trying complex queries on hadoop and in which i require more than one
job to run to get final result..results of job one captures few joins of the
query and I want to pass those results as input to 2nd job and again do
Chaining described in chapter 8 of my book provides this to a limited
degree.
Cascading, http://www.cascading.org/, also supports complex flows. I do not
know how cascading works under the covers.
On Thu, Apr 16, 2009 at 8:23 AM, Shevek had...@anarres.org wrote:
On Tue, 2009-04-14 at 07:59
Cascading is great.
If you looking for more pragmatic approach, which would allow you to
build a workflow
from existing Hadoop tasks and PIG scripts without writing additional
Java code you may want to take a look at HAMAKE:
http://code.google.com/p/hamake/
Vadim
Have you looked at ChainMapper and ChainReducer. It may not be entirely
what you require, but with some modifications perhaps it might work for
you.
Using the ChainMapper and the ChainReducer classes is possible to
compose Map/Reduce jobs that look like [MAP+ / REDUCE MAP*]. And
immediate
On Thu, Apr 16, 2009 at 1:27 AM, tim robertson timrobertson...@gmail.comwrote:
What is not 100% clear to me is when to push to S3:
In the Map I will output the TileId-ZoomLevel-SpeciesId as the key,
along with the count, and in the Reduce I group the counts into larger
tiles, and create the
Have you set hadoop.tmp.dir away from /tmp as well? If hadoop.tmp.dir is
set somewhere in /scratch vs. /tmp, then I'm not sure why Hadoop would be
writing to /tmp.
Hope this helps!
Alex
On Wed, Apr 15, 2009 at 2:37 PM, Jim Twensky jim.twen...@gmail.com wrote:
Alex,
Yes, I bounced the
Cam Macdonell wrote:
Hi,
I'm getting the following warning when running the simple wordcount and
grep examples.
09/04/15 16:54:16 INFO mapred.JobClient: Task Id :
attempt_200904151649_0001_m_19_0, Status : FAILED
Too many fetch-failures
09/04/15 16:54:16 WARN mapred.JobClient: Error
Yes, here is how it looks:
property
namehadoop.tmp.dir/name
value/scratch/local/jim/hadoop-${user.name}/value
/property
so I don't know why it still writes to /tmp. As a temporary workaround, I
created a symbolic link from /tmp/hadoop-jim to /scratch/...
and it works fine
Hi,
I noticed that the bin/hadoop jar command doesn't add the jar being executed
to the classpath. Is this deliberate and what is the reasoning? The result is
that resources in the jar are not accessible from the system class loader.
Rather they are only available from the thread context class
Could any one suggest a working 64-bit Hadoop AMI which is publicly
available? hadoop0.18.3 64-bit AMI is not publicly available. It would be
nice if some one could release it.
Thanks,
Parul V. Kudtarkar
--
View this message in context:
Greetings,
Would anybody be willing to join a PNW Hadoop and/or Lucene User Group
with me in the Seattle area? I can donate some facilities, etc. -- I
also always have topics to speak about :)
Cheers,
Bradford
http://wiki.apache.org/hadoop/FAQ#7
On Thu, Apr 16, 2009 at 6:52 PM, Jae Joo jaejo...@gmail.com wrote:
Will anyone guide me how to avoid the the single point failure of master
node.
This is what I know. If the master node is donw by some reason, the hadoop
system is down and there is no way
Also see
http://files.meetup.com/1228907/Hadoop%20Namenode%20High%20Availability.pptx
.
On Thu, Apr 16, 2009 at 4:58 PM, Jim Twensky jim.twen...@gmail.com wrote:
http://wiki.apache.org/hadoop/FAQ#7
On Thu, Apr 16, 2009 at 6:52 PM, Jae Joo jaejo...@gmail.com wrote:
Will anyone guide me how
Hello,
I'm using 0.19.2-dev-core (checked out from cvs and build). With 51 maps, i
have a case where 50 tasks have completed and 1 is pending, about 1400
records left for this one to process. The completed map taska have written
out 18GB to the HDFS.
The last map task is forrever in the pending
Parul,
Do a search for cloudera, they have a 64bit ami avaliable. Also take a
look at this: http://www.cloudera.com/hadoop-ec2 you can start up an hadoop
cluster quickly and be on your way (good for proofs of concept and one time
jobs like they state).
Sincerely,
Lalit Kapoor
On Thu, Apr
The kickstart script was something that the operations staff was using to
initialize new machines, I never actually saw the script, just figured out
that there was a firewall in place.
On Thu, Apr 16, 2009 at 1:28 PM, Mithila Nagendra mnage...@asu.edu wrote:
Jason: the kickstart script - was
The firewall was run at system startup, I think there was a
/etc/sysconfig/iptables file present which triggered the firewall.
I don't currently have access to any centos 5 machines so I can't easily
check.
On Thu, Apr 16, 2009 at 6:54 PM, jason hadoop jason.had...@gmail.comwrote:
The
Thanks! I ll see what I can find out.
On Fri, Apr 17, 2009 at 4:55 AM, jason hadoop jason.had...@gmail.comwrote:
The firewall was run at system startup, I think there was a
/etc/sysconfig/iptables file present which triggered the firewall.
I don't currently have access to any centos 5
Cam,
This isn't Hadoop-specific, it's how Linux treats its network configuration.
If you look at /etc/host.conf, you'll probably see a line that says order
hosts, bind -- this is telling Linux's DNS resolution library to first read
your /etc/hosts file, then check an external DNS server.
You
That setting will instruct future file writes to replicate two-fold. This
has no bearing on existing files; replication can be set on a per-file
basis, so they already have their replications set in the DFS indivdually.
Use the command: bin/hadoop fs -setrep [-R] repl_factor filename...
to
35 matches
Mail list logo