It's normal that they are all empty. Look at files with .log extension.
12 Nisan 2009 Pazar 23:30 tarihinde halilibrahimcakir
halilibrahimca...@mynet.com yazdı:
I followed these steps:
$ bin/stop-all.sh
$ rm -ri /tmp/hadoop-root
$ bin/hadoop namenode -format
$ bin/start-all.sh
and looked
Sorry the log file( hadoop-root-namenode-debian.log ) content:
2009-04-12 16:27:22,762 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
/
STARTUP_MSG: Starting NameNode
STARTUP_MSG:nbsp;nbsp; host = debian/127.0.1.1
On Mon, Apr 13, 2009 at 1:07 AM, Todd Lipcon t...@cloudera.com wrote:
Hey Brian,
This is really interesting stuff. I'm curious - have you tried these same
experiments using the Java API? I'm wondering whether this is FUSE-specific
or inherent to all HDFS reads. I'll try to reproduce this
Hey, I tried the following :
- created a dir temp for user A and permission 733
- created a dir temp/test for user B and permission 722
- - created a file temp/test/test.txt for user B and permission
722
Now in HDFS, user A can list as well as read the contents of
Currently, only failed tasks are attempted on a node other than the one
where it failed. For killed tasks, there is no such policy for retries.
failed to report status usually indicates that the task did not report
sufficient progress. However, it is possible that the task itself was not
Chukwa is a Hadoop subproject aiming to do something similar, though
particularly for the case of Hadoop logs. You may find it useful.
Hadoop unfortunately does not support concurrent appends. As a
result, the Chukwa project found itself creating a whole new demon,
the chukwa collector,
Hey
Did u find any class or way out for storing results of Job1 map/reduce in
memory and using that as an input to job2 map/Reduce?I am facing a situation
where I need to do similar thing.If anyone can help me out..
Pankil
On Wed, Apr 8, 2009 at 12:51 AM, Sharad Agarwal
On Apr 10, 2009, at 11:12 AM, Sagar Naik wrote:
Hi,
I would like to implement a Multi-threaded reducer.
As per my understanding , the system does not have one coz we expect
the output to be sorted.
However, in my case I dont need the output sorted.
You'd probably want to make a blocking
Ari, thanks for your note.
Like to understand more how Chukwa group log entries ... If I have appA running
in machine X, Y and appB running in machine Y, Z. Each of them calling the
Chukwa log API.
Do I have all entries going in the same HDFS file ? or 4 separated HDFS files
based on the
Hey Todd,
Been playing more this morning after thinking about it for the night
-- I think the culprit is not the network, but actually the cache.
Here's the output of your script adjusted to do the same calls as I
was doing (you had left out the random I/O part).
[br...@red tmp]$ java
Hey all,
I'm also extending the ClusterMapReduceTestCase and having a bit of trouble
as well.
Currently I'm getting :
Starting DataNode 0 with dfs.data.dir:
build/test/data/dfs/data/data1,build/test/data/dfs/data/data2
Starting DataNode 1 with dfs.data.dir:
Sry, I forgot to include the not-IntelliJ-console output :)
09/04/13 12:07:14 ERROR mapred.MiniMRCluster: Job tracker crashed
java.lang.NullPointerException
at java.io.File.init(File.java:222)
at org.apache.hadoop.mapred.JobHistory.init(JobHistory.java:143)
at
It need not be anything to worry about. Do you see anything at user
level (task, job, copy, or script) fail because of this?
On a distributed system with many nodes, there would be some errors on
some of the nodes for various reasons (load, hardware, reboot, etc).
HDFS usually should work
We're looking into power grid visualization and were wondering if anyone
could recommend a good java native lib (that plays nice with hadoop) to
render some layers of geospatial data. At this point we have the cluster
crunching our test data, formats, and data structures, and we're now
looking at
Jothi, thanks for the explanation. One question though: why shouldn't timed
out tasks be retried on a different machine ? As you pointed out, it could
very well have been due to the machine having problems. To me a timeout is
just like any other kind of failure.
-- Stefan
From: Jothi
I'm running some mapreduce, and some jobs has outofmemory errors, and I find
that that the raw data itself also got corrupted, becomes zero bytes, very
strange to me, I did not look very detail into it, but just want to check
quickly with someone with such experience. I'm running at 0.18.3.
thanks
Hi Amar,
I just have tried. Everything worked as expected. I guess user A in your
experiment was a superuser so that he could read anything.
Nicholas Sze
/// permission testing //
drwx-wx-wx - nicholas supergroup 0 2009-04-13 10:55
in hadoop-*-examples.jar, use randomwriter to generate the data and sort
to sort it.
- Aaron
On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi forpan...@gmail.com wrote:
Your data is too small I guess for 15 clusters ..So it might be overhead
time of these clusters making your total MR jobs more
Hi Everyone,
I'm working on a relatively simple MapReduce job with a slight complication
with regards to the ordering of my key/values heading into the reducer. The
output from the mapper might be something like
cat - doc5, 1
cat - doc1, 1
cat - doc5, 3
...
Here, 'cat' is my key and the
Mithila,
You said all the slaves were being utilized in the 3 node cluster. Which
application did you run to test that and what was your input size? If you
tried the word count application on a 516 MB input file on both cluster
setups, than some of your nodes in the 15 node cluster may not be
I'm not familiar with setOutputValueGroupingComparator
what about adding the doc# in the key and have your own
hashing/Partitioner?
so doing something like
cat_doc5- 1
cat_doc1- 1
cat_doc5- 3
the hashing method would take everything before _ as the hash.
the shuffling would still put the
I'm not sure if this is exactly what you want but, can you emit map records
as:
cat, doc5 - 3
cat, doc1 - 1
cat, doc5 - 1
and so on..
This way, your reducers will get the intermediate key,value pairs as
cat, doc5 - 3
cat, doc5 - 1
cat, doc1 - 1
then you can split the keys (cat, doc*)
Oh, I forgot to tell that you should change your partitioner to send all the
keys in the form of cat,* to the same reducer but it seems like Jeremy has
been much faster than me :)
-Jim
On Mon, Apr 13, 2009 at 5:24 PM, Jim Twensky jim.twen...@gmail.com wrote:
I'm not sure if this is exactly
HAMAKE is make-like utility for Hadoop. More information at the project page:
http://code.google.com/p/hamake/
Documentation is still quite poor, but core functionality is working
and I plan on improving it further.
Sincerely,
Vadim
Thanks Aaron.
Jim: The three clusters I setup had ubuntu running on them and the dfs was
accessed at port 54310. The new cluster which I ve setup has Red Hat Linux
release 7.2 (Enigma)running on it. Now when I try to access the dfs from one
of the slaves i get the following response: dfs cannot be
Can you ssh between the nodes?
-jim
On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra mnage...@asu.edu wrote:
Thanks Aaron.
Jim: The three clusters I setup had ubuntu running on them and the dfs was
accessed at port 54310. The new cluster which I ve setup has Red Hat Linux
release 7.2
Hello,
I am trying to use Pellet library for some OWL inferencing in my mapper
class. But I can't find a way to bundle the library jar files in my job jar
file. I am exporting my project as a jar file from Eclipse IDE. Will it work
if I create the jar manually and include all the jar files Pellet
create a directroy call 'lib' in your project's root dir, then put all the
3rd party jar in it.
2009/4/14 Farhan Husain russ...@gmail.com
Hello,
I am trying to use Pellet library for some OWL inferencing in my mapper
class. But I can't find a way to bundle the library jar files in my job jar
Hello,
Java compiler generates the following warning when I use MultipleOutputs
class as (MultipleOutpus type) mos.getCollector(String, Reporter) method
returns OutputCollector instead of OutputCollector(K,V).
warning: [unchecked] unchecked call to collect(K,V) as a member of the
raw type
Yes I can..
On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky jim.twen...@gmail.com wrote:
Can you ssh between the nodes?
-jim
On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra mnage...@asu.edu
wrote:
Thanks Aaron.
Jim: The three clusters I setup had ubuntu running on them and the dfs
was
Does anyone have a input formatter for bzip2?
--
Best Regards, Edward J. Yoon
edwardy...@apache.org
http://blog.udanax.org
Pankil Doshi wrote:
Hey
Did u find any class or way out for storing results of Job1 map/reduce in
memory and using that as an input to job2 map/Reduce?I am facing a
situation
where I need to do similar thing.If anyone can help me out..
Normally you would write the job output to a file and
warning: [unchecked] unchecked call to collect(K,V) as a member of the
raw type org.apache.hadoop.mapred.OutputCollector
Yes, I can live with this warning, but it really makes me uneasy. Any
suggestions to remove this warning?
You can suppress the warning using annotation in your code:
Usually, a task is killed when
1. User explicitly kills the task himself
2. Framework kills the task because it did not progress enough
3. Tasks that were speculatively executed
Hence the reason for killing has, more often than not, nothing to do with
the health of the node where it was running,
Hi, all
I am wondering whether the hdfs is suitable for other applications , which
may be more general purpose application or simply a huge amount of storage?
Any feedback for that?
Thanks.
Lei
HI Lei,
Is there any particular problem on your hand?
Thanks
On Mon, Apr 13, 2009 at 9:01 PM, Lei Xu xule...@gmail.com wrote:
Hi, all
I am wondering whether the hdfs is suitable for other applications , which
may be more general purpose application or simply a huge amount of storage?
Any
The following very simple program will tell the VM to drop the pages being
cached for a file. I tend to spin this in a for loop when making large tar
files, or otherwise working with large files, and the system performance
really smooths out.
Since it use open(path) it will churn through the inode
If you pack your images into sequence files, as the value items, the cluster
will automatically do a decent job of ensuring that the input splits made
from the sequences files are local to the map task.
We did this in production at a previous job and it worked very well for us.
Might as well turn
I have a nice variant of this in the ch7 examples section of my book,
including a standalone wrapper around the virtual cluster for allowing
multiple test instances to share the virtual cluster - and allow an easier
time to poke around with the input and output datasets.
It even works decently
39 matches
Mail list logo