Max number of files in HDFS?

2007-08-28 Thread Taeho Kang
Dear All, Hi, my name is Taeho and I am trying to figure out the maximum number of files a namenode can hold. The main reason for doing this is that I want to have some estimates on how many files I can put into the HDFS without overflowing the Namenode machine's memory. I know the number

not reducing

2007-08-28 Thread Torsten Curdt
We came across an issue where our jobs failed to report back to the tracker. (https://issues.apache.org/jira/browse/HADOOP-1790) Now we are getting a little bit further and the map-phase is working just fine but the reduce seems to be just stuck at 0%. We are see the following in the logs:

Re: Max number of files in HDFS?

2007-08-28 Thread Enis Soztutar
Taeho Kang wrote: Hello Sameer. Thank you for your useful link. It's been very helpful! By the way, our Hadoop cluster has a namenode with 4GBytes of RAM. Based on the analysis found in the HADOOP-1687 ( http://issues.apache.org/jira/browse/HADOOP-1687), we could probably state that for

Re: Max number of files in HDFS?

2007-08-28 Thread Sameer Paranjpye
Taeho Kang wrote: Hello Sameer. Thank you for your useful link. It's been very helpful! By the way, our Hadoop cluster has a namenode with 4GBytes of RAM. Based on the analysis found in the HADOOP-1687 ( http://issues.apache.org/jira/browse/HADOOP-1687), we could probably state that for

Re: looking for some help with pig syntax

2007-08-28 Thread Alan Gates
I think the following will do what you want. t1 = load table1 as id, listOfId; t2 = load table2 as id, f1; t1a = foreach t1 generate flatten(listOfId); -- flattens the lisOfId into a set of ids b = join t1a by $0, t2 by id; -- join the two together. c = foreach b generate t2.id, t2.f1; --

RE: looking for some help with pig syntax

2007-08-28 Thread Joydeep Sen Sarma
Will it? Trying an example: t1 = {1, 2, 3, 4} t2 = {2, alpha,3,beta,4,gamma} desired outcome c = {1, alpha, beta, gamma} /* or alternatively */ c = {1, 2,alpha,3,beta,4,gamma} but as proposed (I hope I am reading the pig document correctly): t1a = {2,3,4} b = {2, 2, alpha} //

Re: looking for some help with pig syntax

2007-08-28 Thread Alan Gates
Sorry, I misunderstood what you were trying to generate. Perhaps the following will come closer: t1 = load table1 as id, listOfId; -- 1, 2,3,4 t2 = load table2 as id, f1; -- 2,a,3,b,4,c a = foreach t1 generate id, flatten(listOfId); -- 1,2,1,3,1,4 b = join a by $0, t2 by id; --

FW: Removing files after processing

2007-08-28 Thread Stu Hood
Does anyone have any ideas on this issue? Otherwise, if I were to write a patch to add this option for jobs to Hadoop, would it be useful for anyone else? Thanks Stu -Original Message- From: Stu Hood [EMAIL PROTECTED] Sent: Fri, August 24, 2007 9:43 am To:

Re: FW: Removing files after processing

2007-08-28 Thread Doug Cutting
I think this is related to HADOOP-1558: https://issues.apache.org/jira/browse/HADOOP-1558 Per-job cleanups that are not run clientside must be run in a separate JVM, since we, as a rule, don't run user code in long-lived daemons. Doug Stu Hood wrote: Does anyone have any ideas on this

Re: FW: Removing files after processing

2007-08-28 Thread Matt Kent
I would find it useful to have some sort of listener mechanism, where you could register an object to be notified of a job completion event and then respond to it accordingly. Matt On 8/28/07, Stu Hood [EMAIL PROTECTED] wrote: Does anyone have any ideas on this issue? Otherwise, if I were

Re: FW: Removing files after processing

2007-08-28 Thread Doug Cutting
Matt Kent wrote: I would find it useful to have some sort of listener mechanism, where you could register an object to be notified of a job completion event and then respond to it accordingly. There is a job completion notification feature. property namejob.end.notification.url/name

RE: secondary namenode errors

2007-08-28 Thread Joydeep Sen Sarma
I don't think the secondary namenodes are working throughout - so not sure they are a factor. What I observed: - stopped dfs. took backup copy of current/ directory - restarted dfs with new 0.13.1 - after file system is back up - fsck says fs is corrupt. Large number of files have blocks

Re: Hbase scripts problem

2007-08-28 Thread Michele Catasta
Hi Michael, thanks for the detailed answer, it has been helpful (especially the log4j DEBUG level for all that classes). Check the logs to see if you can get a clue as to what is going on. Did the cluster HMaster get the shutdown signal? (Is it running the shutdown sequence?) Logs are in

Re: Hbase scripts problem

2007-08-28 Thread Michael Stack
Michele Catasta wrote: Hi Michael, thanks for the detailed answer, it has been helpful (especially the log4j DEBUG level for all that classes). Check the logs to see if you can get a clue as to what is going on. Did the cluster HMaster get the shutdown signal? (Is it running the

RE: looking for some help with pig syntax

2007-08-28 Thread Joydeep Sen Sarma
I am misunderstanding something. following intro to pig-latin doc (p6), the flatten generating 'a' would generate 1,2,3,4 (and not 1,2,1,3,1,4) -Original Message- From: Alan Gates [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 28, 2007 12:47 PM To: hadoop-user@lucene.apache.org Cc:

Re: looking for some help with pig syntax

2007-08-28 Thread Utkarsh Srivastava
Hi, There are 2 different data types in Pig i) Tuple: a collection of fields, like a database record ii) Bag: collection of tuples, like a database table. In, t1 = load table1 as id, listOfId; If listOfId is a bag, flattening will give you 1, 2 1, 3 1, 4 If listOfId is a tuple, flattening