Re: No FileSystem for scheme: file

2007-09-27 Thread aonewa
I have same problem but I can't solve it. Help me pls. Tom White wrote: I've seen this error running Hadoop when hadoop-default.xml was missing from the classpath. Tom On 24/09/2007, jibjoice [EMAIL PROTECTED] wrote: when i use nutch's search, i'm getting error that

RE: Hadoop Get-Together Details

2007-09-27 Thread Rob Styles
Or any UK developers interested in drinking warm, flat beer while watching the rain? Rob Styles Programme Manager, Data Services, Talis tel: +44 (0)870 400 5000 fax: +44 (0)870 400 5001 direct: +44 (0)870 400 5004 mobile: +44 (0)7971 475 257 msn: [EMAIL PROTECTED] blog:

RE: build question

2007-09-27 Thread Runping Qi
Try to add something like the following lines in your build.xml: path id=project.classpath ... pathelement location=${hadoop.home}/contrib/hadoop-datajoin.jar/ ... /path Runping -Original Message- From: C G [mailto:[EMAIL PROTECTED] Sent: Wednesday,

RE: build question

2007-09-27 Thread C G
Thanks Runping! For any other ant-challenged souls, I solved the problem based on Runping's comments using: path id=proto.classpath path refid=classpath/ pathelement location=${build.dir}/hadoop-datajoin.jar/ /path and then in my compile phase within the javac.../javac rules:

RE: build question

2007-09-27 Thread Runping Qi
You are welcome. I just realized that I missed the second half (classpath refid=proto.classpath/). But glad you figured it out. Runping -Original Message- From: C G [mailto:[EMAIL PROTECTED] Sent: Thursday, September 27, 2007 7:49 AM To: hadoop-user@lucene.apache.org Subject: RE:

Re: Hadoop Get-Together Details

2007-09-27 Thread Doug Cutting
C G wrote: Are there any other east coast developers interested in a Boston-area get together? FYI, I'll be at ApacheCon in Atlanta this November 14th 15th, which might be a good place for a Hadoop BOF. http://www.us.apachecon.com/ Doug

Re: Hadoop Get-Together Details

2007-09-27 Thread Dmitry
I am interested. As usual ApacheCon is very interesting itself thanks, DT www.ejinz.com - Original Message - From: Doug Cutting [EMAIL PROTECTED] To: hadoop-user@lucene.apache.org Sent: Thursday, September 27, 2007 11:22 AM Subject: Re: Hadoop Get-Together Details C G wrote: Are

computing conditional probabilities with Hadoop?

2007-09-27 Thread Chris Dyer
Hi all-- I'm new to using Hadoop so I'm hoping to get a little guidance on what the best way to solve a particular class of problems would be. The general use case is this: from a very small set of data, I will generate a massive set of pairs of values, ie, A,B. I would like to compute the

Re: computing conditional probabilities with Hadoop?

2007-09-27 Thread Ted Dunning
I work on very similar problems (except for a pronounced dislike of MLE's). The easiest way to approach your problem is as a sequence of three map-reduce steps: Step 1, count pairs map: A, B = A,B, 1 combine and reduce: key, counts = key, sum(counts) output name: pairs Step 2,

Re: computing conditional probabilities with Hadoop?

2007-09-27 Thread Colin Evans
Hi Chris, This requires some finesse, but you can do it all with 3 map/reduce steps. The third step requires that you do a join, which is the tricky part. Basically, the operation looks like this: Map/Reduce 1: pairwise sums Map: A,B = A,B, 1 Reduce: A, B, 1, 1, 1... = A, B, sum(A, B)

Announcing release of Kosmos Filesystem (KFS)

2007-09-27 Thread Sriram Rao
Greetings! We are happy to announce the release of Kosmos Filesystem (KFS) as an open source project. KFS was designed and implemented at Kosmix Corp. The initial release of KFS is version 0.1 (alpha). The source code as well as pre-compiled binares for x86-64-Linux-FC5 platforms is available

Shuffle or redistribute function

2007-09-27 Thread Nathan Wang
I saw a similar post (http://www.mail-archive.com/hadoop-user@lucene.apache.org/msg01112.html) but the answer was not very satisfactory. Image I used Hadoop as a fault-tolerance storage. I had 10 nodes, each loaded with 200GBs. I found the nodes were overloaded and decided to add 2 new boxes

Re: Shuffle or redistribute function

2007-09-27 Thread Ted Dunning
I just spent some time evaluating rebalancing options. The method that I found most useful was to walk through my data a directory at a time incrementing the replication count on that directory's contents, waiting a minute and then dropping it back down. That resulted in a rebalanced storage