.
Dennis Kubes
I am just getting started with HBase. Thinking about using it for
future Nutch development. Have successfully build with ant scripts. In
the src I see conf and bin directories similar to Hadoop. But in the
build I don't see those. Is there a build that I can drop into a
directory that
You can also use a MapRunnable implementation but that would allow
global only to each Map task.
Dennis Kubes
James Yu wrote:
For example:
I put all user global variables in a class I called MyGlobals
public class MyGlobals {
static public int var1;
...
}
Then, in whatever map
copy to itself.
So, long story short, on windows, when running local jobs with hadoop =
0.15, always use the C:/ notation to avoid problems.
Dennis Kubes
.
/description
/property
I already have this running on a development cluster so if you need help
getting it up and running, shoot me an email and we can figure it out
through email or IM.
Dennis Kubes
Daniel Wressle wrote:
Sorry to spam you people to death, but going through the logs I noticed
to
unzip your jar, create a lib directory in your jar, and then put and
referenced third party jars in that lib directory to be included in your
jar, then zip it back up and deploy as a single jar.
Of course the new patch does away with the need to do this :)
Dennis Kubes
Dennis, your patch
HADOOP-1622 fixes this to allow multiple resources, including jars, to
be submitted for a single mapreduce job. There is currently a patch
that works but still needs a little fixing. I should be able to get
it finished in the next couple of days.
Dennis Kubes
Eyal Oren wrote:
On 08/13/07
How does this fix the non-ecc memory errors?
Dennis Kubes
Daeseong Kim wrote:
To solve the checksum errors on the non-ecc memory machines, I
modified some codes in DFSClient.java and DataNode.java.
The idea is very simple.
The original CHUNK structure is
{chunk size}{chunk data}{chunk size
No, other than you have to have the jars on all machines and it only
supports jar files.
Dennis Kubes
Joydeep Sen Sarma wrote:
i found depositing required jars into the lib directory works just great
(all those jars are prepended to the classpath by the hadoop script).
Any flaws doing
load during the hang?
Dennis Kubes
Eyal Oren wrote:
Hi,
We're seeing some of our map tasks hanging indefinitely during
execution, and I just wanted to check if somebody maybe had seen similar
things (to figure out whether things are a hadoop problem or our problem).
We have a job that runs
. If a single node crashes
that has a replicate once the namenode has been updated then the data
will be replicated from one of the other 2 replicates to another 3
system if available.
Dennis Kubes
Venkates .P.B. wrote:
Am I missing something very fundamental ? Can someone comment
I don't know about your record count but the link error means that you
don't have the right version of glibc that was used to compile the
hadoop native libraries. It shouldn't matter though as hadoop will fall
back to the java versions if the native can't be used.
Dennis Kubes
Sandhya E
beginning to fail, just wanted to know if anyone else is
seeing similar behavior?
Dennis Kubes
How do I finalize the DFS Upgrade? Do I just need to remove the
previous directories or is there a script or command line option that
will do this for me?
Dennis Kubes
For anybody else who needs to know this the command is:
bin/hadoop dfsadmin -finalizeUpgrade
Dennis Kubes
Dennis Kubes wrote:
How do I finalize the DFS Upgrade? Do I just need to remove the
previous directories or is there a script or command line option that
will do this for me?
Dennis
Just curious what linux filesystems people are using for large hadoop
installations (like maybe at yahoo :) and what performance they are
seeing with those filesystems?
Dennis Kubes
Kubes
Dennis Kubes wrote:
Ok, I read the JIRA and have been hacking away at this for the past
couple of hours. I have a workable patch for that I just need to test.
It follows what the JIRA proposed to create a master job.jar file from
multiple job jar files passed. I will test and post
will need to setup
both ssh keys from that single machines to all slaves and add the slave
machines to the slaves file.
Dennis Kubes
Phantom wrote:
Hi
I had a question about ways of setting up large clusters. I did read the
WIKI which has a posting on this matter and I have also been through
.
This would be read once per map task, not once per map entry. A third
option is using a custom MapRunner but I think that is overkill for this.
Dennis Kubes
Ilya Vishnevsky wrote:
Well, as far as I understand the Properties object is packed into xml
file before to be sent to the Mapper, so it can
be occurring.
Dennis Kubes
Thanks
Avinash
Is there a way to redistribute blocks evenly across all DFS nodes. If
not I would be happy to program a tool to do so but I would need a
little guidance on howto.
Dennis Kubes
.
127.0.0.1 localhost.localdomain localhost
192.x.x.x yourhost.yourdomain.com yourhost
Dennis Kubes
Cedric Ho wrote:
Oh, and I also tried to use 192.168.1.179 as a datanode itself, and
only this datanode connect to the namenode on this same host
successfully.
On 5/14
Doug,
Do we know if this is a hardware issue. If it is possibly a software
issue I can dedicate some resources to tracking down bugs. I would just
need a little guidance on where to start looking?
Dennis Kubes
Doug Cutting wrote:
Do you have ECC memory on your nodes? Nodes without ECC
Doug Cutting wrote:
Dennis Kubes wrote:
Do we know if this is a hardware issue. If it is possibly a software
issue I can dedicate some resources to tracking down bugs. I would
just need a little guidance on where to start looking?
We don't know. The checksum mechanism is designed
on big clusters? Two, does anyone know if this is hardware or software
related? Here are some examples.
Dennis Kubes
org.apache.hadoop.fs.ChecksumException: Checksum error:
/d01/hadoop/mapred/local/task_0042_m_001905_0/spill0.out at 79597056
at
org.apache.hadoop.fs.ChecksumFileSystem
I can read files through fs cat. Also the errors once rescheduled will
most often fix themselves, although some times enough of them occur
where a single job will fail.
Dennis Kubes
Raghu Angadi wrote:
Can you manually try to read one such file with 'hadoop fs -cat
://linuxquality.sunsite.dk/articles/testsuites/
Dennis Kubes
Owen O'Malley wrote:
On Apr 23, 2007, at 7:39 AM, Steve Schlosser wrote:
I've got a small hadoop cluster running (5 nodes today, going to 15+
soon), and I'd like to do some benchmarking. My question to the group
is - what is the first benchmark you
What is the ratio of checksum errors that everyone else is seeing while
running large jobs? I am trying to determine what an average number of
checksum errors is vs. what should be occurring.
Dennis Kubes
There is a log4j.properties file in the conf directory as well. This
directory is conf by default but can be changed via the HADOOP_CONF_DIR
variable. Although I haven't tested it I believe it can also be set
through the classpath.
Dennis Kubes
Andrew Jsyqf wrote:
Hi all,
In the hadoop
You can export it in the shell or set it in conf/hadoop-env.sh script.
export HADOOP_CONF_DIR=
./runscripthere
If you want to use the conf/log4j.properties file then you shouldn't
have to do anything as it is set by default.
Dennis Kubes
Feng Jiang wrote:
I did find the conf
);
secondjob.setInputValueClass(Text.class);
Dennis Kubes
Alejandro Abdelnur wrote:
I may be missing something silly here,
I have a MR that generates an output type (Text,Text)
Consuming that output for another MR it becomes a plain text file thus the
input is (LongWriteable, Text) with the long key being the line number
changed it to use local and it still
errored. I opened the job file and the class that it says it can't find
org.apache.nutch.crawl.CrawlDatum is there so I am a little stumped.
Dennis Kubes
held in the
hadoop log file determined by hadoop.log.dir and hadoop.log.file?
Is this correct?
Dennis Kubes
I don't know if I completely understand what you are asking but let me
try to answer your questions.
David Pollak wrote:
Howdy,
Is there a way to store by-product data someplace where it can be
read? For example, as I'm iterating over a collection of documents, I
want to generate some
David Pollak wrote:
On Nov 8, 2006, at 7:41 AM, Dennis Kubes wrote:
I don't know if I completely understand what you are asking but let
me try to answer your questions.
David Pollak wrote:
Howdy,
Is there a way to store by-product data someplace where it can be
read? For example, as I'm
One of the servers probably didn't shut down completely. You would need
to check each of your nodes like this:
ps -ef | grep java
See if any of the servers are running. If they are they can be killed
manually like this where you put in the process_id
kill -9 process_id
Dennis
Sanjay
This is probably a simple question but when I run my MR job I am getting
10 splits and therefore 10 output files like part-x. Is there a way
to merge those outputs into a single file using the currently running MR
job or do I need to run another MR job to merge them?
Dennis Kubes
in advance!
Lorenzo Thione
On May 24, 2006, at 7:31 AM, Dennis Kubes wrote:
Using Java 5 will allow the threads of various tasks to take
advantage of multiple processors. Just make sure you set you map
tasks property to a multiple of the number of processors total. We
are running multi-core
that creates a reader shared by an inner mapper class or is that hacking
the interfaces when I should be thinking about this terms of sequential
processing?
Dennis
Doug Cutting wrote:
Dennis Kubes wrote:
The problem is that I have a single url. I get the inlinks to that
url and then I need to go
throughput. This is a different type of thinking then coding
an algorithm for a single machine so I am learning as I go. Thanks for
your help.
Dennis
Doug Cutting wrote:
Dennis Kubes wrote:
Ok. This is a little different in that I need to start thinking
about my algorithms in terms
I keep seeing references to job.jar files. Can someone explain what the
job.jar files are and are they only used in distributed mode?
Dennis
Is there a way to look inside of an index that is on the DFS. Like luke on
local?
Dennis
For the Hadoop filesystem, I know that it is basically unlimited in terms of
storage because one can always add new hardware, but it is unlimited in
terms of a single file?
What I mean by this is if I store a file /user/dir/a.index and this file has
say 100 blocks in it where there is only enough
43 matches
Mail list logo