Re: Finished or not?

2008-07-09 Thread Andreas Kostyrka
On Wednesday 09 July 2008 05:56:28 Amar Kamat wrote: Andreas Kostyrka wrote: See attached screenshot, wonder how that could happen? What Hadoop version are you using? Is this reproducible? Is it possible to get the JT logs? Hadoop 0.17.0 Reproducible: As such no. I did notice that

Re: Tasktrackers job cache directories not always cleaned up

2008-07-09 Thread Amareshwari Sriramadasu
The proposal on http://issues.apache.org/jira/browse/HADOOP-3386 takes care of this. Thanks Amareshwari Amareshwari Sriramadasu wrote: If task tracker didn't receive KillJobAction, its true that job directory will not removed. And your observation is correct that some task trackers didn't

Re: FW: [jira] Updated: (HADOOP-3601) Hive as a contrib project

2008-07-09 Thread tim robertson
Hi Ashish I am very excited to try this, having been evaluating Hadoop, HBase, Cascading etc recently to process 100 millions of Biodiversity records (expecting billions soon), with a view for data mining purposes (species that are critically endangered and observed outside of protected areas

Custom InputFormat/OutputFormat

2008-07-09 Thread Francesco Tamberi
Hi all, I want to use hadoop for some streaming text processing on text documents like: doc id=... ... ... text text text ... /doc Just xml-like notation but not real xml files. I have to work on text included between doc tags, so I implemented an InputFormat (extending FileInputFormat)

Re: File permissions issue

2008-07-09 Thread Joman Chu
So we can fix this issue by putting all three users in a common group? We did that after we encountered the issue, but we still got the errors. Note that we had not restarted hadoop, so the permissions were still as described earlier. Should we have restarted Hadoop after the grouping? On Wed,

Failed to run the Quickstart guide for Pseudo-distributed operation

2008-07-09 Thread boris starchev
I was following the Hadoop 0.17.0 quickstart guide (Windows and Cygwin). First of all: 1)in C:\cygwin\home\bstarchev change hadoop-env.sh and copy in C:\hadoop-0.17.0\conf echo 'export JAVA_HOME=/cygdrive/c/Program Files/Java/jdk1.5.0_12' hadoop-env.sh 2)in C:\cygwin\home\bstarchev create

Re: Cannot decommission on 16.4

2008-07-09 Thread Chris Kline
Thanks Lohit. The key point I missed was that dfs.hosts.exclude should exist in before starting the namenode. It worked after restarting hdfs. -Chris On Jul 8, 2008, at 3:56 PM, lohit wrote: there are few things which aren't documented. - you should have defined full path of file as

Re: NotReplicatedYetException

2008-07-09 Thread Raghu Angadi
I noticed the same recently. For me it happened since the datanodes were deleting lot of blocks. I was doing something like : bin/hadoop fs -rm 4Gb; sleep 10; bin/hadoop fs -put 4Gb-input 4Gb; This is because, when datanode is deleting blocks it does not inform the namenode about the new

Has anyone packed up the src/test/..../ClusterMapReduceTestCase into a separate jar for use by external code

2008-07-09 Thread Jason Venner
It would be very convenient to have this available for building unit tests for map reduce jobs. In the interests of avoiding NiH I am hoping this has been done Happy Elephant riding! -- Jason Venner Attributor - Program the Web http://www.attributor.com/ Attributor is hiring Hadoop

Re: Namenode Exceptions with S3

2008-07-09 Thread Lincoln Ritter
So far, I've had no luck. Can anyone out there clarify the permissible characters/format for aws keys and bucket names? I haven't looked at the code here, but it seems strange to me that the same restrictions on host/port etc apply given that it's a totally different system. I'd love to see

Re: Has anyone packed up the src/test/..../ClusterMapReduceTestCase into a separate jar for use by external code

2008-07-09 Thread Jason Venner
Nothing like missing a jar file hadoop-...test.jar in the distribution :-[ Jason Venner wrote: It would be very convenient to have this available for building unit tests for map reduce jobs. In the interests of avoiding NiH I am hoping this has been done Happy Elephant riding! --

Google Protocol Buffers - structured binary data

2008-07-09 Thread Stuart Sierra
In case people are interested: Google has released its Protocol Buffers under the Apache license. It generates (de)serialization code for structured data in Java/C++/Python from a simple schema description. http://code.google.com/p/protobuf/ Should be pretty simple to wrap the generated code

Re: Google Protocol Buffers - structured binary data

2008-07-09 Thread hank williams
Has anyone looked at facebook thrift: http://developers.facebook.com/thrift/ It seems to do essentially the same thing as protocol buffer and I am curious if anyone has looked at either or both and has any thoughts. We need a solution for fast server to server communications and so any insight

RE: parallel mapping on single server

2008-07-09 Thread Haijun Cao
Set number of map slots per tasktracker to 8 in order to run 8 map tasks on one machine (assuming one tasktracker per machine) at the same time: property namemapred.tasktracker.map.tasks.maximum/name value8/value descriptionThe maximum number of map tasks that will be run simultaneously

Re: File permissions issue

2008-07-09 Thread s29752-hadoopuser
Hi Joman, The temp directory we talking here is the temp directory in the local file system (i.e. Unix in your case). There is a config property hadoop.tmp.dir (see hadoop-default.xml), which specifies the path of temp directory. Before you start the cluster, you should set this property and

Re: Google Protocol Buffers - structured binary data

2008-07-09 Thread Matt Kent
I have extensive experience with Thrift, and have been playing with protocol buffers for a couple days. Thrift is a more complete RPC solution, including client and server implementations, whereas PB is just a data exchange format. If you want a ready-to-go RPC server, use Thrift. If you want

slash in AWS Secret Key, WAS Re: Namenode Exceptions with S3

2008-07-09 Thread Jimmy Lin
I've come across this problem before. My simple solution was to regenerate new keys until I got one without a slash... ;) -Jimmy I have Hadoop 0.17.1 and an AWS Secret Key that contains a slash ('/'). With distcp, I found that using the URL format s3://ID:[EMAIL PROTECTED]/ did not work,

RE: FW: [jira] Updated: (HADOOP-3601) Hive as a contrib project

2008-07-09 Thread Ashish Thusoo
Hi Tim, Point well taken. We are trying to get this out as soon as possible. Thanks for the offer for helping us test this things out. We will get something out to you (an early version) as soon as we have a logical feature checkpoint. Cheers, Ashish -Original Message- From: tim

Re: slash in AWS Secret Key, WAS Re: Namenode Exceptions with S3

2008-07-09 Thread Lincoln Ritter
Thanks for the reply. I've heard the regenerate suggestion before, but for organizations who show aws all over the place this is a huge pain. I think it would be better to come up with a more robust solution to handling aws info. -lincoln -- lincolnritter.com On Wed, Jul 9, 2008 at 12:44

Re: Namenode Exceptions with S3

2008-07-09 Thread Stuart Sierra
I regenerated my AWS Secret Key to one that does not use a slash, and I was able to successfully use the s3://ID:[EMAIL PROTECTED]/ style URL for distcp. It seems the S3 FileSystem is not unencoding URLs property. I've filed a bug: https://issues.apache.org/jira/browse/HADOOP-3733 -Stuart On

can't package zip file with hadoop streaming -file argument?

2008-07-09 Thread Karl Anderson
I'm unable to ship a file with a .zip suffix to the mapper using the - file argument for hadoop streaming. I am able to ship it if I change the suffix to .zipp. Is this a bug, or perhaps has something to do with the jar file format which is used to send files to the instance? For example,

Re: How to chain multiple hadoop jobs?

2008-07-09 Thread Lukas Vlcek
Hi, May be you should try to look at JobControl (see TestJobControl.java for particular example). Regards, Lukas On Wed, Jul 9, 2008 at 10:28 PM, Mori Bellamy [EMAIL PROTECTED] wrote: Hey all, I'm trying to chain multiple mapreduce jobs together to accomplish a complex task. I believe that

Re: help with hadoop program

2008-07-09 Thread Mori Bellamy
It seems like this problem could be done with one map-reduce task. From your input, map out (ID,{type,TimeStamp}) in your reduce, you can figure out how many A1's appear close to eachother. one naive approach is to iterate through all of the sets and collect them in some collection class.