would like to hear from you about if it continued to grow. One instance
of this I had seen in the past was related to weak reference related to
socket objects. I do not see that happening here though.
Sent from phone
On Dec 23, 2012, at 10:34 AM, Edward Capriolo edlinuxg...@gmail.com
wrote
releases (= 0.20.204), several memory optimization and startup
optimizations have been done. It should help you as well.
On Thu, Dec 27, 2012 at 1:48 PM, Edward Capriolo edlinuxg...@gmail.com
wrote:
So it turns out the issue was just the size of the filesystem.
2012-12-27 16:37:22,390 WARN
at 2:22 PM, Edward Capriolo edlinuxg...@gmail.com
wrote:
I am not sure GC had a factor. Even when I forced a GC it cleared 0%
memory. One would think that since the entire NameNode image is stored in
memory that the heap would not need to grow beyond that, but that sure
does
not seem
Tried this..
NameNode is still Ruining my Xmas on its slow death march to OOM.
http://imagebin.org/240453
On Sat, Dec 22, 2012 at 10:23 PM, Suresh Srinivas sur...@hortonworks.comwrote:
-XX:NewSize=1G -XX:MaxNewSize=1G
this where GC kept falling behind and we either ran out of heap
or would be in full gc. By reducing heap, we were forcing concurrent mark
sweep to occur and avoided both full GC and running out of heap space as
the JVM would collect objects more frequently.
On Dec 21, 2012, at 8:24 PM, Edward
said that... outside of MapR, have any of the distros
certified themselves on 1.7 yet?
On Dec 22, 2012, at 6:54 AM, Edward Capriolo edlinuxg...@gmail.com
wrote:
I will give this a go. I have actually went in JMX and manually
triggered
GC no memory is returned. So I assumed something
also be the reason.
Do you collect gc logs? Send that as well.
Sent from a mobile device
On Dec 22, 2012, at 9:51 AM, Edward Capriolo edlinuxg...@gmail.com
wrote:
Newer 1.6 are getting close to 1.7 so I am not going to fear a number and
fight the future.
I have been aat around 27
but not all and then the line
keeps rising. Delta is about 10-17 hours until the heap is exhaused.
On Sat, Dec 22, 2012 at 7:03 PM, Edward Capriolo edlinuxg...@gmail.comwrote:
Blocks is ~26,000,000 Files is a bit higher ~27,000,000
Currently running:
[root@hnn217 ~]# java -version
java version
I have an old hadoop 0.20.2 cluster. Have not had any issues for a while.
(which is why I never bothered an upgrade)
Suddenly it OOMed last week. Now the OOMs happen periodically. We have a
fairly large NameNode heap Xmx 17GB. It is a fairly large FS about
27,000,000 files.
So the strangest
DataJoin is an example. Most people doing joins use Hive or Pig rather
then code them up themselves.
On Tue, Jul 24, 2012 at 5:19 PM, Abhinav M Kulkarni
abhinavkulka...@gmail.com wrote:
Hi,
Do we not have any info on this? Join must be such a common scenario for
most of the people out on
In all my experience you let FileSystem instances close themselves.
On Tue, Jul 24, 2012 at 10:34 AM, Koert Kuipers ko...@tresata.com wrote:
Since FileSystem is a Closeable i would expect code using it to be like
this:
FileSystem fs = path.getFileSystem(conf);
try {
// do something with
In all places I have found it only to be the primary group, not all
the users supplemental groups.
On Mon, Jul 16, 2012 at 3:05 PM, Clay B. c...@clayb.net wrote:
Hi all,
I have a Hadoop cluster which uses Samba to map an Active Directory domain
to my CentOS 5.7 Hadoop cluster. However, I
me know what sort of details I can provide to help resolve this
issue.
Best,
Juan
On Fri, Jul 13, 2012 at 4:10 PM, Edward Capriolo edlinuxg...@gmail.comwrote:
If the datanode is not coming back you have to explicitly tell hadoop
to leave safemode.
http://hadoop.apache.org/common/docs
If the datanode is not coming back you have to explicitly tell hadoop
to leave safemode.
http://hadoop.apache.org/common/docs/r0.17.2/hdfs_user_guide.html#Safemode
hadoop dfsadmin -safemode leave
On Fri, Jul 13, 2012 at 9:35 AM, Juan Pino juancitomiguel...@gmail.com wrote:
Hi,
I can't get
No. The number of lines is not known at planning time. All you know is
the size of the blocks. You want to look at mapred.max.split.size .
On Sat, Jun 16, 2012 at 5:31 AM, Ondřej Klimpera klimp...@fit.cvut.cz wrote:
I tried this approach, but the job is not distributed among 10 mapper nodes.
It does not matter what the file size is because the file size is
split into blocks which is what the NN tracks.
For larger deployments you can go with a large block size like 256MB
or even 512MB. Generally the bigger the file the better split
calculation is very input format dependent however.
We actually were in an Amazon/host it yourself debate with someone.
Which prompted us to do some calculations:
http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/myth_busters_ops_editition_is
We calculated the cost for storage alone of 300 TB on ec2 as 585K a month!
The cloud people hate
Maybe you can do some VIEWs or unions or merge tables on the mysql
side to overcome the aspect of launching so many sqoop jobs.
On Thu, May 31, 2012 at 6:02 PM, Srinivas Surasani
hivehadooplearn...@gmail.com wrote:
All,
We are trying to implement sqoop in our environment which has 30 mysql
So a while back their was an article:
http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data
I recently did my own take on full text searching your logs with
solandra, though I have prototyped using solr inside datastax
enterprise as well.
if You are getting a SIGSEG it never hurts to try a more recent JVM.
21 has many bug fixes at this point.
On Tue, May 22, 2012 at 11:45 AM, Jason B urg...@gmail.com wrote:
JIRA entry created:
https://issues.apache.org/jira/browse/HADOOP-8423
On 5/21/12, Jason B urg...@gmail.com wrote:
Honestly that is a hassle, going from 205 to cdh3u3 is probably more
or a cross-grade then an upgrade or downgrade. I would just stick it
out. But yes like Michael said two clusters on the same gear and
distcp. If you are using RF=3 you could also lower your replication to
rf=2 'hadoop dfs
, Alexander Lorenz
wget.n...@googlemail.com wrote:
no. That is the Flume Open Source Mailinglist. Not a vendor list.
NFS logging has nothing to do with decentralized collectors like Flume, JMS
or Scribe.
sent via my mobile device
On Apr 22, 2012, at 12:23 AM, Edward Capriolo edlinuxg
Since each hadoop tasks is isolated from others having more tmp
directories allows you to isolate that disk bandwidth as well. By
listing the disks you give more firepower to shuffle-sorting and
merging processes.
Edward
On Sun, Apr 22, 2012 at 10:02 AM, Jay Vyas jayunit...@gmail.com wrote:
I
It seems pretty relevant. If you can directly log via NFS that is a
viable alternative.
On Sat, Apr 21, 2012 at 11:42 AM, alo alt wget.n...@googlemail.com wrote:
We decided NO product and vendor advertising on apache mailing lists!
I do not understand why you'll put that closed source stuff
Hive is beginning to implement Region support where one metastore will
manage multiple filesystems and jobtrackers. When a query creates a
table it will then be copied to one ore more datacenters. In addition
the query planner will intelligently attempt to run queries in regions
only where all the
You can NOT connect to hive thrift to confirm it's status. Thrift is
thrift not http. But you are right to say HiveServer does not produce
and output by default.
if
netstat -nl | grep 1
shows status it is up.
On Mon, Apr 16, 2012 at 5:18 PM, Rahul Jain rja...@gmail.com wrote:
I am
You need three things. 1 install snappy to a place the system can pick
it out automatically or add it to your java.library.path
Then add the full name of the codec to io.compression.codecs.
hive set io.compression.codecs;
http://www.edwardcapriolo.com/wiki/en/Tomcat_Hadoop
Have all the hadoop jars and conf files in your classpath
--or-- construct your own conf and URI programatically
URI i = URI.create(hdfs://192.168.220.200:54310);
FileSystem fs = FileSystem.get(i,conf);
On Fri, Apr 13, 2012 at 7:40 AM, Jessica
Nathan but together the steps together on this blog.
http://blog.milford.io/2012/01/kicking-the-tires-on-hadoop-0-23-pseudo-distributed-mode/
Which fills out the missing details such as
property
nameyarn.nodemanager.local-dirs/name
value/value
descriptionthe local directories used
You are better off on the ML.
Hadoop is not designed for high throughput not low latency operations.
This carries over to the IRC room :) JK
I feel most hadoop questions are harder to ask and answer on IRC
(large code segments, deep questions) and as a result the mailing list
is more natural for
It has been in a quasi-defunct state for a while now. It seems like
hadoop.next and yarn, helps archive a similar effect of hod. Plus it
has this new hotness factor.
On Fri, Mar 9, 2012 at 2:41 AM, Stijn De Weirdt stijn.dewei...@ugent.be wrote:
(my apologies for those who have received this
Mike,
Snappy is cool and all, but I was not overly impressed with it.
GZ zipps much better then Snappy. Last time I checked for our log file
gzip took them down from 100MB- 40MB, while snappy compressed them
from 100MB-55MB. That was only with sequence files. But still that is
pretty significant
On Sun, Feb 26, 2012 at 1:49 PM, Harsh J ha...@cloudera.com wrote:
Hi Mohit,
On Sun, Feb 26, 2012 at 10:42 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:
Thanks! Some questions I have is:
1. Would it work with sequence files? I am using
SequenceFileAsTextInputStream
Yes, you just need
On Tue, Feb 21, 2012 at 7:50 PM, Mohit Anchlia mohitanch...@gmail.com wrote:
It looks like in mapper values are coming as binary instead of Text. Is
this expected from sequence file? I initially wrote SequenceFile with Text
values.
On Tue, Feb 21, 2012 at 4:13 PM, Mohit Anchlia
I would almost agree with prospective. But their is a problem with 'java is
slow' theory. The reason is that in a 100 percent write workload gc might
be a factor.
But in the real world people have to read data and read becomes disk bound
as your data gets larger then memory.
Unless C++ can make
for HBase to load 1/2
trillion cells. That makes HBase 10X more expensive in terms of hardware,
power consumption, and data center real estate.
- Doug
On Fri, Feb 17, 2012 at 3:58 PM, Edward Capriolo edlinuxg...@gmail.comwrote:
I would almost agree with prospective. But their is a problem
You ain't gotta like me, you just mad
Cause I tell it how it is, and you tell it how it might be
-Attributed to Puff Daddy
Now apparently T. Lipcon
On Mon, Feb 13, 2012 at 2:33 PM, Todd Lipcon t...@cloudera.com wrote:
Hey Doug,
Want to also run a comparison test with inter-cluster
Hadoop can work on a number of filessytems hdfs , s3. Local files. Brisk
file system is known as cfs. Cfs stores all block and meta data in
cassandra. Thus it does not use a name node. Brisk fires up a jobtracker
automatically as well. Brisk also has a hivemeta store backed by cassandra
so takes
On Tue, Feb 7, 2012 at 5:24 PM, Eli Finkelshteyn iefin...@gmail.com wrote:
Hi Folks,
This might be a stupid question, but I'm new to Java and Hadoop, so...
Anyway, if I want to check what FileSystem is currently being used at some
point (i.e. evaluating FileSystem.get(conf)), what would be
Task tracker sometimes so not clean up their mapred temp directories well
if that is the case the tt on startup can spent many minutes deleting
files. I use find to delete files older then a couple of days.
On Friday, January 27, 2012, hadoop hive hadooph...@gmail.com wrote:
Hey Harsh,
but
On Tue, Jan 17, 2012 at 10:08 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
Hello,
How much memory/JVM heap does NameNode use for each block?
I've tried locating this in the FAQ and on search-hadoop.com, but
couldn't find a ton of concrete numbers, just these two:
The challenges of this design is people accessing the same data over and
over again is the uncommon usecase for hadoop. Hadoop's bread and butter is
all about streaming through large datasets that do not fit in memory. Also
your shuffle-sort-spill is going to play havoc on and file system based
The problem with checkpoint /2nn is that it happily runs and has no
outward indication that it is unable to connect.
Because you have a large edits file you startup will complete, however with
that size it could take hours. It logs nothing while this is going on but
as long as the CPU is working
I would check out hitune. I have a github project that connects to the
JobTracker and stores counters, job times and other stats into Cassandra.
https://github.com/edwardcapriolo/hadoop_cluster_profiler
Worth checking out as discovering how to connect and mine information from
the JobTracker was
Sounds like a job for next gen map reduce native libraries and gpu's. A
modern day Dr frankenstein for sure.
On Saturday, November 19, 2011, Tim Broberg tim.brob...@exar.com wrote:
Perhaps this is a good candidate for a native library, then?
From: Mike
A problem with matrix multiplication in hadoop is that hadoop is row
oriented for the most part. I have thought about this use case however and
you can theoretically turn a 2D matrix into a 1D matrix and then that fits
into the row oriented nature of hadoop. Also being that the typical mapper
can
This directory can get very large, in many cases I doubt it would fit on a
ram disk.
Also RAM Disks tend to help most with random read/write, since hadoop is
doing mostly linear IO you may not see a great benefit from the RAM disk.
On Mon, Oct 3, 2011 at 12:07 PM, Vinod Kumar Vavilapalli
On Fri, Sep 30, 2011 at 9:03 AM, bikash sharma sharmabiks...@gmail.comwrote:
Hi,
Does anyone knows if Linux containers (which are like kernel supported
virtualization technique for providing resource isolation across
process/appication) have ever been used with Hadoop to provide resource
On Fri, Sep 23, 2011 at 11:52 AM, ivan.nov...@emc.com wrote:
Hi Harsh,
On 9/22/11 8:48 PM, Harsh J ha...@cloudera.com wrote:
Ivan,
Writing your own program was overkill.
The 'yes' coreutil is pretty silly, but nifty at the same time. It
accepts an argument, which it would repeat
On Fri, Sep 2, 2011 at 4:04 PM, Meng Mao meng...@gmail.com wrote:
We have a compression utility that tries to grab all subdirs to a directory
on HDFS. It makes a call like this:
FileStatus[] subdirs = fs.globStatus(new Path(inputdir, *));
and handles files vs dirs accordingly.
We tried to
On Sun, Aug 21, 2011 at 10:22 AM, Joey Echeverria j...@cloudera.com wrote:
Not that I know of.
-Joey
On Fri, Aug 19, 2011 at 1:16 PM, modemide modem...@gmail.com wrote:
Ha, what a silly mistake.
Thank you Joey.
Do you also happen to know of an easier way to tell which racks the
This should explain it http://jz10.java.no/java-4-ever-trailer.html .
On Tue, Aug 16, 2011 at 1:17 PM, Adi adi.pan...@gmail.com wrote:
On Mon, Aug 15, 2011 at 9:00 PM, Chris Song sjh...@gmail.com wrote:
Why hadoop should be built in JAVA?
For integrity and stability, it is
On Wed, Aug 3, 2011 at 6:10 AM, praveenesh kumar praveen...@gmail.comwrote:
Hi,
Anyone working on YCSB (Yahoo Cloud Service Benchmarking) for HBase ??
I am trying to run it, its giving me error:
$ java -cp build/ycsb.jar com.yahoo.ycsb.CommandLine -db
com.yahoo.ycsb.db.HBaseClient
YCSB
On Tue, Jul 5, 2011 at 5:28 PM, Jim Falgout jim.falg...@pervasive.comwrote:
I've done this before by placing the name of each file to process into a
single file (newline separated) and using the NLineInputFormat class as the
input format. Run your job with the single file with all of the file
On Tue, Jul 5, 2011 at 10:05 AM, jeff.schm...@shell.com wrote:
Um kill -9 pid ?
-Original Message-
From: Juwei Shi [mailto:shiju...@gmail.com]
Sent: Friday, July 01, 2011 10:53 AM
To: common-user@hadoop.apache.org; mapreduce-u...@hadoop.apache.org
Subject: Jobs are still in
On Tue, Jul 5, 2011 at 11:45 AM, Juwei Shi shiju...@gmail.com wrote:
We sometimes have hundreds of map or reduce tasks for a job. I think it is
hard to find all of them and kill the corresponding jvm processes. If we do
not want to restart hadoop, is there any automatic methods?
2011/7/5
That looks like an ancient version of java. Get 1.6.0_u24 or 25 from oracle.
Upgrade to a recent java and possibly update your c libs.
Edward
On Fri, Jul 1, 2011 at 7:24 PM, Shi Yu sh...@uchicago.edu wrote:
I had difficulty upgrading applications from Hadoop 0.20.2 to Hadoop
0.20.203.0.
We have run into this issue as well. Since hadoop is RR writing different
size disks really screw things up royally especially if you are running at
high capacity. We have found that decommissioning hosts for stretches of
time is more effective then the balancer in extreme situations. Another
On Sun, Jun 5, 2011 at 1:04 PM, Shi Yu sh...@uchicago.edu wrote:
We just upgraded from 0.20.2 to hadoop-0.20.203.0
Running the same code ends up a massive amount of debug
information on the screen output. Normally this type of
information is written to logs/userlogs directory. However,
All,
You know the story:
You have data files that are created every 5 minutes.
You have hundreds of servers.
You want to put those files in hadoop.
Eventually:
You get lots of files and blocks.
Your namenode and secondary name node need more memory (BTW JVM's have
issues at large Xmx values).
On Tue, May 31, 2011 at 2:50 PM, W.P. McNeill bill...@gmail.com wrote:
I'm launching long-running tasks on a cluster running the Fair Scheduler.
As I understand it, the Fair Scheduler is preemptive. What I expect to see
is that my long-running jobs sometimes get killed to make room for other
On Sat, May 21, 2011 at 4:13 PM, highpointe highpoint...@gmail.com wrote:
Does this copy text bother anyone else? Sure winning any award is great
but
does hadoop want to be associated with innovation like WikiLeaks?
[Only] through the free distribution of information, the guaranteed
On Sun, May 22, 2011 at 7:29 PM, Todd Lipcon t...@cloudera.com wrote:
C'mon guys -- while this is of course an interesting debate, can we
please keep it off common-user?
-Todd
On Sun, May 22, 2011 at 3:30 PM, Edward Capriolo edlinuxg...@gmail.com
wrote:
On Sat, May 21, 2011 at 4:13 PM
On Sun, May 22, 2011 at 8:44 PM, Todd Lipcon t...@cloudera.com wrote:
On Sun, May 22, 2011 at 5:10 PM, Edward Capriolo edlinuxg...@gmail.com
wrote:
Correct. But it is a place to discuss changing the content of
http://hadoop.apache.org which is what I am advocating.
Fair enough
Good job. I brought this up an another thread, but was told it was not a
problem. Good thing I'm not crazy.
On Sat, May 21, 2011 at 12:42 AM, Joe Stein
charmal...@allthingshadoop.comwrote:
I came up with a nice little hack to trick hadoop into calculating disk
usage with df instead of du
On Thu, May 19, 2011 at 11:54 AM, Ted Dunning tdunn...@maprtech.com wrote:
ZK started as sub-project of Hadoop.
On Thu, May 19, 2011 at 7:27 AM, M. C. Srivas mcsri...@gmail.com wrote:
Interesting to note that Cassandra and ZK are now considered Hadoop
projects.
There were independent
http://hadoop.apache.org/#What+Is+Apache%E2%84%A2+Hadoop%E2%84%A2%3F
March 2011 - Apache Hadoop takes top prize at Media Guardian Innovation
Awards
The Hadoop project won the innovator of the yearaward from the UK's
Guardian newspaper, where it was described as had the potential as a
greater
On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
Yes you can however it will require customization of HDFS. Take a
look at HDFS-347 specifically the HDFS-347-branch-20-append.txt patch.
I have been altering it for use with HBASE-3529. Note that the patch
always cached so calculating the 'du -sk' on a
host even with hundreds of thousands of files the du -sk generally uses high
i/o for a couple of seconds. I am using 2TB disks too.
Sridhar
On Fri, Apr 8, 2011 at 12:15 AM, Edward Capriolo edlinuxg...@gmail.com
wrote:
I have a 0.20.2 cluster. I
I have a 0.20.2 cluster. I notice that our nodes with 2 TB disks waste
tons of disk io doing a 'du -sk' of each data directory. Instead of
'du -sk' why not just do this with java.io.file? How is this going to
work with 4TB 8TB disks and up ? It seems like calculating used and
free disk space could
On Thu, Mar 31, 2011 at 10:43 AM, XiaoboGu guxiaobo1...@gmail.com wrote:
I have trouble browsing the file system vi namenode web interface, namenode
saying in log file that th –G option is invalid to get the groups for the
user.
I thought this was not the case any more but hadoop forks to
On Thursday, March 17, 2011, Marc Sturlese marc.sturl...@gmail.com wrote:
Is there any way to check if a seqfile is corrupted without iterate over all
its keys/values till it crashes?
I've seen that I can get an IOException when opening it or an IOException
reading the X key/value (depending
On Thu, Mar 17, 2011 at 1:20 PM, jigar shah js...@pandora.com wrote:
Hi,
we are running a 50 node hadoop cluster and have a problem with these
attempt directories piling up(for eg attempt_201101170925_126956_m_000232_0)
and taking a lot of space. when i restart the tasktracker daemon these
On Mon, Mar 14, 2011 at 1:23 PM, He Chen airb...@gmail.com wrote:
Hi all
Any suggestions?
Bests
Chen
Images have been banned.
On Thu, Mar 10, 2011 at 12:48 AM, Adarsh Sharma
adarsh.sha...@orkash.com wrote:
Thanks Harsh, i.e why if we again format namenode after loading some data
INCOMATIBLE NAMESPACE ID's error occurs.
Best Regards,
Adarsh Sharma
Harsh J wrote:
Formatting the NameNode initializes the
On Thu, Mar 3, 2011 at 10:00 AM, Tom Deutsch tdeut...@us.ibm.com wrote:
Along with Brian I'd also suggest it depends on what you are doing with
the images, but we used Hadoop specifically for this purpose in several
solutions we build to do advanced imaging processing. Both scale out
ability
On Fri, Feb 11, 2011 at 7:14 PM, Ted Dunning tdunn...@maprtech.com wrote:
Bandwidth is definitely better with more active spindles. I would recommend
several larger disks. The cost is very nearly the same.
On Fri, Feb 11, 2011 at 3:52 PM, Shrinivas Joshi jshrini...@gmail.comwrote:
Thanks
On Thu, Jan 27, 2011 at 5:42 AM, Steve Loughran ste...@apache.org wrote:
On 27/01/11 07:28, Manuel Meßner wrote:
Hi,
you may want to take a look into the streaming api, which allows users
to write there map-reduce jobs with any language, which is capable of
writing to stdout and reading
On Sat, Jan 22, 2011 at 9:59 PM, Ted Yu yuzhih...@gmail.com wrote:
In the test code, JobTracker is returned from:
mr = new MiniMRCluster(0, 0, 0, file:///, 1, null, null, null,
conf);
jobTracker = mr.getJobTrackerRunner().getJobTracker();
I guess it is not exposed in non-test
On Fri, Jan 21, 2011 at 9:56 AM, abhatna...@vantage.com
abhatna...@vantage.com wrote:
Where is this file located?
Also does anyone has a sample
--
View this message in context:
http://lucene.472066.n3.nabble.com/Hive-rc-tp2296028p2302262.html
Sent from the Hadoop lucene-users mailing list
On Wed, Jan 19, 2011 at 1:32 PM, Marc Farnum Rendino mvg...@gmail.com wrote:
On Tue, Jan 18, 2011 at 8:59 AM, Adarsh Sharma adarsh.sha...@orkash.com
wrote:
I want to know *AT WHAT COSTS *it comes.
10-15% is tolerable but at this rate, it needs some work.
As Steve rightly suggest , I am in
On Mon, Jan 17, 2011 at 8:13 AM, Adarsh Sharma adarsh.sha...@orkash.com wrote:
Harsh J wrote:
Could you re-check your permissions on the $(dfs.data.dir)s for your
failing DataNode versus the user that runs it?
On Mon, Jan 17, 2011 at 6:33 PM, Adarsh Sharma adarsh.sha...@orkash.com
wrote:
On Mon, Jan 17, 2011 at 6:08 AM, Steve Loughran ste...@apache.org wrote:
On 17/01/11 04:11, Adarsh Sharma wrote:
Dear all,
Yesterday I performed a kind of testing between *Hadoop in Standalone
Servers* *Hadoop in Cloud.
*I establish a Hadoop cluster of 4 nodes ( Standalone Machines ) in
On Fri, Jan 14, 2011 at 5:05 PM, Attila Csordas attilacsor...@gmail.com wrote:
Hi,
what other jars should be added to the build path from 0.21.0
besides hadoop-common-0.21.0.jar in order to make 0.21.0 NLineInputFormat
work in 0.20.2 as suggested below?
Generally can somebody provide me a
On Tue, Dec 28, 2010 at 11:36 PM, Hemanth Yamijala yhema...@gmail.com wrote:
Hi,
On Tue, Dec 28, 2010 at 6:03 PM, Rajgopal Vaithiyanathan
raja.f...@gmail.com wrote:
I wrote a script to map the IP's to a rack. The script is as follows. :
for i in $* ; do
topo=`echo $i | cut -d.
2010/12/7 Petrucci Andreas petrucci_2...@hotmail.com:
hello there, im trying to compile libhdfs in order but there are some
problems. According to http://wiki.apache.org/hadoop/MountableHDFS i have
already installes fuse. With ant compile-c++-libhdfs -Dlibhdfs=1 the buils is
successful.
On Tue, Nov 30, 2010 at 3:21 AM, Harsh J qwertyman...@gmail.com wrote:
Hey,
On Tue, Nov 30, 2010 at 4:56 AM, Marc Sturlese marc.sturl...@gmail.com
wrote:
Hey there,
I am doing some tests and wandering which are the best practices to deal
with very small files which are continuously being
On Sat, Nov 13, 2010 at 9:50 PM, Todd Lipcon t...@cloudera.com wrote:
We do have policies against breaking APIs between consecutive major versions
except for very rare exceptions (eg UnixUserGroupInformation went away when
security was added).
We do *not* have any current policies that
On Sat, Nov 13, 2010 at 4:33 PM, Shi Yu sh...@uchicago.edu wrote:
I agree with Steve. That's why I am still using 0.19.2 in my production.
Shi
On 2010-11-13 12:36, Steve Lewis wrote:
Our group made a very poorly considered decision to build out cluster
using
Hadoop 0.21
We discovered
On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley kwi...@keithwiley.com wrote:
Is there a particularly good reason for why the hadoop fs command supports
-cat and -tail, but not -head?
Keith Wiley
number so that it can attempt to *detect* the
type of the file.
Cheers
On Fri, Sep 24, 2010 at 11:41 PM, Edward Capriolo
edlinuxg...@gmail.comwrote:
Many times a hadoop job produces a file per reducer and the job has
many reducers. Or a map only job one output file per input file and
you
On Mon, Sep 27, 2010 at 11:13 AM, Keith Wiley kwi...@keithwiley.com wrote:
On 2010, Sep 27, at 7:02 AM, Edward Capriolo wrote:
On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley kwi...@keithwiley.com
wrote:
Is there a particularly good reason for why the hadoop fs command
supports
-cat and -tail
Many times a hadoop job produces a file per reducer and the job has
many reducers. Or a map only job one output file per input file and
you have many input files. Or you just have many small files from some
external process. Hadoop has sub optimal handling of small files.
There are some ways to
It is a bad idea to permanently disable 2nn. The edits file grows very
very large and will not be processed until the name node restart. We
had a 12GB edit file that took 40 minutes of downtime to process.
On Thu, Sep 9, 2010 at 3:08 AM, Jeff Zhang zjf...@gmail.com wrote:
then, do not start
On Wed, Sep 8, 2010 at 1:06 PM, Matthew John tmatthewjohn1...@gmail.com wrote:
Hi guys,
I m trying to run a sort on a metafile which had a record consisting of a
key8 bytes and a value32 bytes. Sort will be with respect to the key.
But my input file does not have a header. So inorder to avail
The fact that the memory is high is not necessarily a bad thing.
Faster garbage collection implies more CPU usage.
I had some success following the tuning advice here, to make my memory
usage less spikey
http://blog.mikiobraun.de/2010/08/cassandra-gc-tuning.html
Again, less spikes != better
) {
String[] keyval = prop.split(=, 2);
if (keyval.length == 2) {
conf.set(keyval[0], keyval[1]);
}
}
}
You can add a log after the bold line to verify that all -D options are
returned.
On Thu, Sep 2, 2010 at 10:09 AM, Edward Capriolo edlinuxg
This is 0.20.0
I have an eclipse run configuration passing these as arguments
-D hive2rdbms.jdbc.driver=com.mysql.jdbc.Driver -D
hive2rdbms.connection.url=jdbc:mysql://localhost:3306/test -D
hive2rdbms.data.query=SELECT id,name FROM name WHERE $CONDITIONS -D
hive2rdbms.bounding.query=SELECT
On Tue, Aug 31, 2010 at 5:07 PM, Gang Luo lgpub...@yahoo.com.cn wrote:
Hi all,
I am the administrator of a hadoop cluster. I want to know how to specify a
group a user belong to. Or hadoop just use the group/user information from the
linux system it runs on? For example, if a user 'smith'
I am working with DataDrivenOutputFormat from trunk. None of the unit
tests seem to test the bounded queries
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setJarByClass(TestZ.class);
1 - 100 of 256 matches
Mail list logo