Re: Getting started with HBase

2007-10-19 Thread Dennis Kubes
. Dennis Kubes

Getting started with HBase

2007-10-18 Thread Dennis Kubes
I am just getting started with HBase. Thinking about using it for future Nutch development. Have successfully build with ant scripts. In the src I see conf and bin directories similar to Hadoop. But in the build I don't see those. Is there a build that I can drop into a directory that

Re: coding question: user's global variables

2007-10-12 Thread Dennis Kubes
You can also use a MapRunnable implementation but that would allow global only to each Map task. Dennis Kubes James Yu wrote: For example: I put all user global variables in a class I called MyGlobals public class MyGlobals { static public int var1; ... } Then, in whatever map

File Paths, Hadoop = 0.15 and Local Jobs

2007-10-12 Thread Dennis Kubes
copy to itself. So, long story short, on windows, when running local jobs with hadoop = 0.15, always use the C:/ notation to avoid problems. Dennis Kubes

Re: External jars revisited.

2007-10-10 Thread Dennis Kubes
. /description /property I already have this running on a development cluster so if you need help getting it up and running, shoot me an email and we can figure it out through email or IM. Dennis Kubes Daniel Wressle wrote: Sorry to spam you people to death, but going through the logs I noticed

Re: External jars revisited.

2007-10-10 Thread Dennis Kubes
to unzip your jar, create a lib directory in your jar, and then put and referenced third party jars in that lib directory to be included in your jar, then zip it back up and deploy as a single jar. Of course the new patch does away with the need to do this :) Dennis Kubes Dennis, your patch

Re: Specifying external jars in the classpath for Hadoop

2007-08-14 Thread Dennis Kubes
HADOOP-1622 fixes this to allow multiple resources, including jars, to be submitted for a single mapreduce job. There is currently a patch that works but still needs a little fixing. I should be able to get it finished in the next couple of days. Dennis Kubes Eyal Oren wrote: On 08/13/07

Re: To solve the checksum errors on the non-ecc mem machines.

2007-08-14 Thread Dennis Kubes
How does this fix the non-ecc memory errors? Dennis Kubes Daeseong Kim wrote: To solve the checksum errors on the non-ecc memory machines, I modified some codes in DFSClient.java and DataNode.java. The idea is very simple. The original CHUNK structure is {chunk size}{chunk data}{chunk size

Re: Specifying external jars in the classpath for Hadoop

2007-08-14 Thread Dennis Kubes
No, other than you have to have the jars on all machines and it only supports jar files. Dennis Kubes Joydeep Sen Sarma wrote: i found depositing required jars into the lib directory works just great (all those jars are prepended to the classpath by the hadoop script). Any flaws doing

Re: has anyone seen jobs hanging randomly without terminating?

2007-08-14 Thread Dennis Kubes
load during the hang? Dennis Kubes Eyal Oren wrote: Hi, We're seeing some of our map tasks hanging indefinitely during execution, and I just wanted to check if somebody maybe had seen similar things (to figure out whether things are a hadoop problem or our problem). We have a job that runs

Re: Loading data into HDFS

2007-08-03 Thread Dennis Kubes
. If a single node crashes that has a replicate once the namenode has been updated then the data will be replicated from one of the other 2 replicates to another 3 system if available. Dennis Kubes Venkates .P.B. wrote: Am I missing something very fundamental ? Can someone comment

Re: compressed input files

2007-08-02 Thread Dennis Kubes
I don't know about your record count but the link error means that you don't have the right version of glibc that was used to compile the hadoop native libraries. It shouldn't matter though as hadoop will fall back to the java versions if the native can't be used. Dennis Kubes Sandhya E

Hard Disk Failures

2007-07-31 Thread Dennis Kubes
beginning to fail, just wanted to know if anyone else is seeing similar behavior? Dennis Kubes

How to Finalize DFS Upgrade?

2007-07-31 Thread Dennis Kubes
How do I finalize the DFS Upgrade? Do I just need to remove the previous directories or is there a script or command line option that will do this for me? Dennis Kubes

Re: How to Finalize DFS Upgrade?

2007-07-31 Thread Dennis Kubes
For anybody else who needs to know this the command is: bin/hadoop dfsadmin -finalizeUpgrade Dennis Kubes Dennis Kubes wrote: How do I finalize the DFS Upgrade? Do I just need to remove the previous directories or is there a script or command line option that will do this for me? Dennis

Linux Filesystems Used

2007-07-30 Thread Dennis Kubes
Just curious what linux filesystems people are using for large hadoop installations (like maybe at yahoo :) and what performance they are seeing with those filesystems? Dennis Kubes

Re: Multiple Job Jar files?

2007-07-19 Thread Dennis Kubes
Kubes Dennis Kubes wrote: Ok, I read the JIRA and have been hacking away at this for the past couple of hours. I have a workable patch for that I just need to test. It follows what the JIRA proposed to create a master job.jar file from multiple job jar files passed. I will test and post

Re: Setting up large clusters

2007-06-08 Thread Dennis Kubes
will need to setup both ssh keys from that single machines to all slaves and add the slave machines to the slaves file. Dennis Kubes Phantom wrote: Hi I had a question about ways of setting up large clusters. I did read the WIKI which has a posting on this matter and I have also been through

Re: What objects can be passed to Mapper and Reducer classes using Configuration?

2007-05-31 Thread Dennis Kubes
. This would be read once per map task, not once per map entry. A third option is using a custom MapRunner but I think that is overkill for this. Dennis Kubes Ilya Vishnevsky wrote: Well, as far as I understand the Properties object is packed into xml file before to be sent to the Mapper, so it can

Re: Configuration and Hadoop cluster setup

2007-05-24 Thread Dennis Kubes
be occurring. Dennis Kubes Thanks Avinash

Redistribute blocks evenly across DFS

2007-05-16 Thread Dennis Kubes
Is there a way to redistribute blocks evenly across all DFS nodes. If not I would be happy to program a tool to do so but I would need a little guidance on howto. Dennis Kubes

Re: Namenode cannot accept connection from datanode

2007-05-14 Thread Dennis Kubes
. 127.0.0.1 localhost.localdomain localhost 192.x.x.x yourhost.yourdomain.com yourhost Dennis Kubes Cedric Ho wrote: Oh, and I also tried to use 192.168.1.179 as a datanode itself, and only this datanode connect to the namenode on this same host successfully. On 5/14

Re: Many Checksum Errors

2007-05-02 Thread Dennis Kubes
Doug, Do we know if this is a hardware issue. If it is possibly a software issue I can dedicate some resources to tracking down bugs. I would just need a little guidance on where to start looking? Dennis Kubes Doug Cutting wrote: Do you have ECC memory on your nodes? Nodes without ECC

Re: Many Checksum Errors

2007-05-02 Thread Dennis Kubes
Doug Cutting wrote: Dennis Kubes wrote: Do we know if this is a hardware issue. If it is possibly a software issue I can dedicate some resources to tracking down bugs. I would just need a little guidance on where to start looking? We don't know. The checksum mechanism is designed

Many Checksum Errors

2007-05-01 Thread Dennis Kubes
on big clusters? Two, does anyone know if this is hardware or software related? Here are some examples. Dennis Kubes org.apache.hadoop.fs.ChecksumException: Checksum error: /d01/hadoop/mapred/local/task_0042_m_001905_0/spill0.out at 79597056 at org.apache.hadoop.fs.ChecksumFileSystem

Re: Many Checksum Errors

2007-05-01 Thread Dennis Kubes
I can read files through fs cat. Also the errors once rescheduled will most often fix themselves, although some times enough of them occur where a single job will fail. Dennis Kubes Raghu Angadi wrote: Can you manually try to read one such file with 'hadoop fs -cat

Re: Benchmarking question - how do you test a new cluster?

2007-04-23 Thread Dennis Kubes
://linuxquality.sunsite.dk/articles/testsuites/ Dennis Kubes Owen O'Malley wrote: On Apr 23, 2007, at 7:39 AM, Steve Schlosser wrote: I've got a small hadoop cluster running (5 nodes today, going to 15+ soon), and I'd like to do some benchmarking. My question to the group is - what is the first benchmark you

Ratio of Checksum Errors

2007-04-22 Thread Dennis Kubes
What is the ratio of checksum errors that everyone else is seeing while running large jobs? I am trying to determine what an average number of checksum errors is vs. what should be occurring. Dennis Kubes

Re: Questions about log

2007-02-17 Thread Dennis Kubes
There is a log4j.properties file in the conf directory as well. This directory is conf by default but can be changed via the HADOOP_CONF_DIR variable. Although I haven't tested it I believe it can also be set through the classpath. Dennis Kubes Andrew Jsyqf wrote: Hi all, In the hadoop

Re: Questions about log

2007-02-17 Thread Dennis Kubes
You can export it in the shell or set it in conf/hadoop-env.sh script. export HADOOP_CONF_DIR= ./runscripthere If you want to use the conf/log4j.properties file then you shouldn't have to do anything as it is set by default. Dennis Kubes Feng Jiang wrote: I did find the conf

Re: SequenceFile (Text,Text) becomes plain text

2007-02-02 Thread Dennis Kubes
); secondjob.setInputValueClass(Text.class); Dennis Kubes Alejandro Abdelnur wrote: I may be missing something silly here, I have a MR that generates an output type (Text,Text) Consuming that output for another MR it becomes a plain text file thus the input is (LongWriteable, Text) with the long key being the line number

In Memory FileSystem ClassNotFound

2007-01-30 Thread Dennis Kubes
changed it to use local and it still errored. I opened the job file and the class that it says it can't find org.apache.nutch.crawl.CrawlDatum is there so I am a little stumped. Dennis Kubes

Task Logs Purpose

2007-01-29 Thread Dennis Kubes
held in the hadoop log file determined by hadoop.log.dir and hadoop.log.file? Is this correct? Dennis Kubes

Re: some newby questions

2006-11-08 Thread Dennis Kubes
I don't know if I completely understand what you are asking but let me try to answer your questions. David Pollak wrote: Howdy, Is there a way to store by-product data someplace where it can be read? For example, as I'm iterating over a collection of documents, I want to generate some

Re: some newby questions

2006-11-08 Thread Dennis Kubes
David Pollak wrote: On Nov 8, 2006, at 7:41 AM, Dennis Kubes wrote: I don't know if I completely understand what you are asking but let me try to answer your questions. David Pollak wrote: Howdy, Is there a way to store by-product data someplace where it can be read? For example, as I'm

Re: mapred.JobTracker : Connection refused

2006-09-11 Thread Dennis Kubes
One of the servers probably didn't shut down completely. You would need to check each of your nodes like this: ps -ef | grep java See if any of the servers are running. If they are they can be killed manually like this where you put in the process_id kill -9 process_id Dennis Sanjay

Number of Reduce Outputs

2006-08-29 Thread Dennis Kubes
This is probably a simple question but when I run my MR job I am getting 10 splits and therefore 10 output files like part-x. Is there a way to merge those outputs into a single file using the currently running MR job or do I need to run another MR job to merge them? Dennis Kubes

Re: Multiple tasktrackers per node

2006-05-25 Thread Dennis Kubes
in advance! Lorenzo Thione On May 24, 2006, at 7:31 AM, Dennis Kubes wrote: Using Java 5 will allow the threads of various tasks to take advantage of multiple processors. Just make sure you set you map tasks property to a multiple of the number of processors total. We are running multi-core

Re: Help with MapReduce

2006-05-25 Thread Dennis Kubes
that creates a reader shared by an inner mapper class or is that hacking the interfaces when I should be thinking about this terms of sequential processing? Dennis Doug Cutting wrote: Dennis Kubes wrote: The problem is that I have a single url. I get the inlinks to that url and then I need to go

Re: Help with MapReduce

2006-05-25 Thread Dennis Kubes
throughput. This is a different type of thinking then coding an algorithm for a single machine so I am learning as I go. Thanks for your help. Dennis Doug Cutting wrote: Dennis Kubes wrote: Ok. This is a little different in that I need to start thinking about my algorithms in terms

Purpose of Job.jar

2006-04-06 Thread Dennis Kubes
I keep seeing references to job.jar files. Can someone explain what the job.jar files are and are they only used in distributed mode? Dennis

Look inside index on DFS

2006-03-31 Thread Dennis Kubes
Is there a way to look inside of an index that is on the DFS. Like luke on local? Dennis

Hadoop File Capacity

2006-03-26 Thread Dennis Kubes
For the Hadoop filesystem, I know that it is basically unlimited in terms of storage because one can always add new hardware, but it is unlimited in terms of a single file? What I mean by this is if I store a file /user/dir/a.index and this file has say 100 blocks in it where there is only enough