On Wednesday 09 July 2008 05:56:28 Amar Kamat wrote:
Andreas Kostyrka wrote:
See attached screenshot, wonder how that could happen?
What Hadoop version are you using? Is this reproducible? Is it possible
to get the JT logs?
Hadoop 0.17.0
Reproducible: As such no. I did notice that
The proposal on http://issues.apache.org/jira/browse/HADOOP-3386 takes
care of this.
Thanks
Amareshwari
Amareshwari Sriramadasu wrote:
If task tracker didn't receive KillJobAction, its true that job
directory will not removed.
And your observation is correct that some task trackers didn't
Hi Ashish
I am very excited to try this, having been evaluating Hadoop, HBase,
Cascading etc recently to process 100 millions of Biodiversity records
(expecting billions soon), with a view for data mining purposes (species
that are critically endangered and observed outside of protected areas
Hi all,
I want to use hadoop for some streaming text processing on text
documents like:
doc id=... ... ...
text text
text
...
/doc
Just xml-like notation but not real xml files.
I have to work on text included between doc tags, so I implemented an
InputFormat (extending FileInputFormat)
So we can fix this issue by putting all three users in a common group? We did
that after we encountered the issue, but we still got the errors. Note that we
had not restarted hadoop, so the permissions were still as described earlier.
Should we have restarted Hadoop after the grouping?
On Wed,
I was following the Hadoop 0.17.0 quickstart guide (Windows and Cygwin).
First of all:
1)in C:\cygwin\home\bstarchev change hadoop-env.sh and copy in
C:\hadoop-0.17.0\conf
echo 'export JAVA_HOME=/cygdrive/c/Program Files/Java/jdk1.5.0_12'
hadoop-env.sh
2)in C:\cygwin\home\bstarchev create
Thanks Lohit. The key point I missed was that dfs.hosts.exclude
should exist in before starting the namenode. It worked after
restarting hdfs.
-Chris
On Jul 8, 2008, at 3:56 PM, lohit wrote:
there are few things which aren't documented.
- you should have defined full path of file as
I noticed the same recently. For me it happened since the datanodes were
deleting lot of blocks. I was doing something like :
bin/hadoop fs -rm 4Gb; sleep 10; bin/hadoop fs -put 4Gb-input 4Gb;
This is because, when datanode is deleting blocks it does not inform the
namenode about the new
It would be very convenient to have this available for building unit
tests for map reduce jobs.
In the interests of avoiding NiH I am hoping this has been done
Happy Elephant riding!
--
Jason Venner
Attributor - Program the Web http://www.attributor.com/
Attributor is hiring Hadoop
So far, I've had no luck.
Can anyone out there clarify the permissible characters/format for aws
keys and bucket names?
I haven't looked at the code here, but it seems strange to me that the
same restrictions on host/port etc apply given that it's a totally
different system. I'd love to see
Nothing like missing a jar file hadoop-...test.jar in the distribution :-[
Jason Venner wrote:
It would be very convenient to have this available for building unit
tests for map reduce jobs.
In the interests of avoiding NiH I am hoping this has been done
Happy Elephant riding!
--
In case people are interested:
Google has released its Protocol Buffers under the Apache license. It
generates (de)serialization code for structured data in
Java/C++/Python from a simple schema description.
http://code.google.com/p/protobuf/
Should be pretty simple to wrap the generated code
Has anyone looked at facebook thrift:
http://developers.facebook.com/thrift/
It seems to do essentially the same thing as protocol buffer and I am
curious if anyone has looked at either or both and has any thoughts. We need
a solution for fast server to server communications and so any insight
Set number of map slots per tasktracker to 8 in order to run 8 map tasks
on one machine (assuming one tasktracker per machine) at the same time:
property
namemapred.tasktracker.map.tasks.maximum/name
value8/value
descriptionThe maximum number of map tasks that will be run
simultaneously
Hi Joman,
The temp directory we talking here is the temp directory in the local file
system (i.e. Unix in your case). There is a config property hadoop.tmp.dir
(see hadoop-default.xml), which specifies the path of temp directory. Before
you start the cluster, you should set this property and
I have extensive experience with Thrift, and have been playing with
protocol buffers for a couple days.
Thrift is a more complete RPC solution, including client and server
implementations, whereas PB is just a data exchange format. If you want
a ready-to-go RPC server, use Thrift. If you want
I've come across this problem before. My simple solution was to
regenerate new keys until I got one without a slash... ;)
-Jimmy
I have Hadoop 0.17.1 and an AWS Secret Key that contains a slash ('/').
With distcp, I found that using the URL format s3://ID:[EMAIL PROTECTED]/
did not work,
Hi Tim,
Point well taken. We are trying to get this out as soon as possible.
Thanks for the offer for helping us test this things out. We will get
something out to you (an early version) as soon as we have a logical
feature checkpoint.
Cheers,
Ashish
-Original Message-
From: tim
Thanks for the reply.
I've heard the regenerate suggestion before, but for organizations
who show aws all over the place this is a huge pain. I think it would
be better to come up with a more robust solution to handling aws info.
-lincoln
--
lincolnritter.com
On Wed, Jul 9, 2008 at 12:44
I regenerated my AWS Secret Key to one that does not use a slash, and
I was able to successfully use the s3://ID:[EMAIL PROTECTED]/ style URL
for distcp. It seems the S3 FileSystem is not unencoding URLs
property. I've filed a bug:
https://issues.apache.org/jira/browse/HADOOP-3733
-Stuart
On
I'm unable to ship a file with a .zip suffix to the mapper using the -
file argument for hadoop streaming. I am able to ship it if I change
the suffix to .zipp. Is this a bug, or perhaps has something to do
with the jar file format which is used to send files to the instance?
For example,
Hi,
May be you should try to look at JobControl (see TestJobControl.java for
particular example).
Regards,
Lukas
On Wed, Jul 9, 2008 at 10:28 PM, Mori Bellamy [EMAIL PROTECTED] wrote:
Hey all,
I'm trying to chain multiple mapreduce jobs together to accomplish a
complex task. I believe that
It seems like this problem could be done with one map-reduce task.
From your input, map out (ID,{type,TimeStamp})
in your reduce, you can figure out how many A1's appear close to
eachother. one naive approach is to iterate through all of the sets
and collect them in some collection class.
23 matches
Mail list logo