input files

2008-08-20 Thread Deepak Diwakar
Hadoop usually takes either a single file or a folder as an input parameter.
But is it possible to modify it so that it can take list of files(not a
folder) as input parameter


-- 
- Deepak Diwakar,


Re: input files

2008-08-20 Thread Amareshwari Sriramadasu
You can add more paths to input using 
FileInputFormat.addInputPath(JobConf, Path).
You can also specify comma separated filenames as input path using 
FileInputFormat.setInputPaths(JobConf, String commaSeparatedPaths)
More details at 
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/FileInputFormat.html


You can also use glob path to specify multiple paths in a single path.

Thanks
Amareshwari
Deepak Diwakar wrote:

Hadoop usually takes either a single file or a folder as an input parameter.
But is it possible to modify it so that it can take list of files(not a
folder) as input parameter


  




Re: Missing lib/native/Linux-amd64-64 on hadoop-0.17.2.tar.gz

2008-08-20 Thread Yi-Kai Tsai

hi

Could anyone help to re-pack the 0.17.2 with missing  
lib/native/Linux-amd64-64  ?


thanks

On Wed, Aug 20, 2008 at 9:31 AM, Yi-Kai Tsai [EMAIL PROTECTED] wrote:

  

But we do have  lib/native/Linux-amd64-64 on  hadoop-0.17.1.tar.gz and
hadoop-0.18.0.tar.gz ?




At least for -0.17.1, yes there is.

Regards,

Leon Mergen
  



--
Yi-Kai Tsai (cuma) [EMAIL PROTECTED], Asia Regional Search Engineering.



Hadoop 0.17.2 released

2008-08-20 Thread Owen O'Malley
Hadoop Core 0.17.2 has been released and the website updated. It fixes  
a couple of critical bugs in the 0.17 branch. It can be downloaded from:


http://www.apache.org/dyn/closer.cgi/hadoop/core/

-- Owen


Re: Cannot read reducer values into a list

2008-08-20 Thread Owen O'Malley


On Aug 19, 2008, at 4:57 PM, Deepika Khera wrote:


Thanks for the clarification on this.

So, it seems like cloning the object before adding to the list is the
only solution for this problem. Is that right?


Yes. You can use WritableUtils.clone to do the job.

-- Owen


Re: pseudo-global variable constuction

2008-08-20 Thread Sandy
Thank you very much, Paco and Jason. It works!

For any users who may be curious what this may look like in code, here is a
small snippet of mine:

file: myLittleMRProgram.java
package.org.apache.hadoop.examples;

  public static class Reduce extends MapReduceBase implements ReducerText,
LongWritable, Text, LongWritable {
private int nTax = 0;

public void configure(JobConf job) {
super.configure(job);
String Tax = job.get(nTax);
nTax = Integer.parseInt(Tax);
}

public void reduce() throws IOException {
  
   System.out.println(nTax is:  + nTax);
}

main() {

conf.set(nTax, other_args.get(2));
JobClient.runJob(conf);

return 0;
}



-SM

On Tue, Aug 19, 2008 at 5:02 PM, Jason Venner [EMAIL PROTECTED] wrote:

 Since the map  reduce tasks generally run in a separate java virtual
 machine and on distinct machines from your main task's java virtual machine,
 there is no sharing of variables between the main task and the map or reduce
 tasks.

 The standard way is to store the variable in the Configuration (or JobConf)
 object in your main task
 Then in the configure method of your map and reduce task class, extract the
 variable value from the JobConf object.

 You will need to implement an overriding to the configure method in your
 map and reduce classes.

 This will also require that the variable value be serializable.

 For lots of large variables this can be expensive.


 Sandy wrote:

 Hello,


 My M/R program is going smoothly, except for one small problem. I have a
 global variable that is set by the user (and thus in the main function),
 that I want one of my reduce functions to access. This is a read-only
 variable. After some reading in the forums, I tried something like this:

 file: MyGlobalVars.java
 package org.apache.hadoop.examples;
 public class MyGlobalVars {
static public int nTax;
 }
 --

 file: myLittleMRProgram.java
 package.org.apache.hadoop.examples;
 map function() {
   System.out.println(in map function, nTax is:  + MyGlobalVars.nTax);
 }
 
 main() {
 MyGlobalVars.nTax = other_args.get(2);
 System.out.println(in main function, nTax is:  + MyGlobalVars.nTax);
 
 JobClient.runJob(conf);
 
 return 0;
 }
 

 When I run it, I get:
 in main function, nTax is 20 (which is what I want)
 in map function, nTax is 0 (--- this is not right).


 I am a little confused on how to resolve this. I apologize in advance if
 this is an blatant java error; I only began programming in the language a
 few weeks ago.

 Since Map Reduce tries to avoid the whole shared-memory scene, I am more
 than willing to have each reduce function receive a local copy of this
 user
 defined value. However, I am a little confused on what the best way to do
 this would be. As I see it, my options are:

 1.) write the user defined value to the hdfs in the main function, and
 have
 it read from the hdfs in the reduce function. I can't quite figure out the
 code to this though. I know how to specify -an- input file for the map
 reduce task, but if I did it this way, won't I need to specify two
 separate
 input files?

 2. Put it in the construction of the reduce object (I saw this mentioned
 in
 the archives). How would I accomplish this exactly when the value is user
 defined? Parameter Passing? If so, won't this require me changing the
 underlying map reduce base (which makes me a touch nervous, since i'm
 still
 very new to hadoop).

 What would be the easiest way to do this?

 Thanks in advance for the help. I appreciate your time.

 -SM



 --
 Jason Venner
 Attributor - Program the Web http://www.attributor.com/
 Attributor is hiring Hadoop Wranglers and coding wizards, contact if
 interested



Reminder: Monthly Hadoop User Group Meeting (Bay Area) today

2008-08-20 Thread Ajay Anand
Reminder: The next Hadoop User Group (Bay Area) meeting is scheduled for
today, Wednesday, Aug 20th from 6 - 7:30 pm at Yahoo! Mission College,
Santa Clara, CA, Building 1, Training Rooms 34.

 

Agenda:

Pig Update: Olga Natkovich
Hadoop 0.18 and post 0.18 - Sameer Paranjpye

 

Registration and directions: http://upcoming.yahoo.com/event/1011188

 

Look forward to seeing you there!

Ajay



Know how many records remain?

2008-08-20 Thread Qin Gao
Hi mailing,

Are there any way to know whether the mapper is processing the last record
that assigned to this node, or know how many records remain to be processed
in this node?


Qin


Re: Why is scaling HBase much simpler then scaling a relational db?

2008-08-20 Thread Stuart Sierra
On Tue, Aug 19, 2008 at 9:44 AM, Mork0075 [EMAIL PROTECTED] wrote:
 Can you please explain, why someone should use HBase for horizontal
 scaling instead of a relational database? One reason for me would be,
 that i don't have to implement the sharding logic myself. Are there other?

A slight tangent -- there are various tools that implement sharding
over relational databases like MySQL.  Two that I know of are
DBSlayer,
http://code.nytimes.com/projects/dbslayer
and MySQL Proxy,
http://forge.mysql.com/wiki/MySQL_Proxy

I don't know of any formal comparisons between sharding traditional
database servers and distributed databases like HBase.
-Stuart


RE: Why is scaling HBase much simpler then scaling a relational db?

2008-08-20 Thread Jim Kellerman
Stuart,

In general you will get a quicker response to HBase questions by posting them 
to the HBase mailing list ([EMAIL PROTECTED]) see 
http://hadoop.apache.org/hbase/mailing_lists.html for how to subscribe.

Perhaps the best document on scaling HBase is actually the Bigtable paper:
http://labs.google.com/papers/bigtable.html


---
Jim Kellerman, Senior Engineer; Powerset (a Microsoft Company)

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
 Behalf Of Stuart Sierra
 Sent: Wednesday, August 20, 2008 1:03 PM
 To: core-user@hadoop.apache.org
 Subject: Re: Why is scaling HBase much simpler then scaling a relational db?

 On Tue, Aug 19, 2008 at 9:44 AM, Mork0075 [EMAIL PROTECTED] wrote:
  Can you please explain, why someone should use HBase for horizontal
  scaling instead of a relational database? One reason for me would be,
  that i don't have to implement the sharding logic myself. Are there other?

 A slight tangent -- there are various tools that implement sharding
 over relational databases like MySQL.  Two that I know of are
 DBSlayer,
 http://code.nytimes.com/projects/dbslayer
 and MySQL Proxy,
 http://forge.mysql.com/wiki/MySQL_Proxy

 I don't know of any formal comparisons between sharding traditional
 database servers and distributed databases like HBase.
 -Stuart


RE: Cannot read reducer values into a list

2008-08-20 Thread Deepika Khera
Thanks...this works beautifully :) !

Deepika

-Original Message-
From: Owen O'Malley [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, August 20, 2008 7:52 AM
To: core-user@hadoop.apache.org
Subject: Re: Cannot read reducer values into a list


On Aug 19, 2008, at 4:57 PM, Deepika Khera wrote:

 Thanks for the clarification on this.

 So, it seems like cloning the object before adding to the list is the
 only solution for this problem. Is that right?

Yes. You can use WritableUtils.clone to do the job.

-- Owen


hadoop 0.18.0 ec2 images?

2008-08-20 Thread Karl Anderson
Are there any publicly available EC2 images for Hadoop 0.18.0 yet?   
There don't seem to be any in the hadoop-ec2-images bucket.