Dear Hadoop User Group,
What are elegant ways to do mapred jobs on text-based data encoded with
something other than UTF-8?
It looks like Hadoop assumes the text data is always in UTF-8 and handles
data that way - encoding with UTF-8 and decoding with UTF-8.
And whenever the data is not in UTF-8
Hi!
I've noticed that streaming has big problems handling long lines, when
streaming.
In my special case the output of a reducer process takes very long time to run
and sometimes crashes with a number of random effects, a Java OutOfMemory
being the nicest one.
(which is a fact. A reducer
Hi!
when we run map-reduce on hadoop, is the map running in a single node
or running in parallel on several nodes?
If it is running in parallel, the input file should be split. how can
hadoop split input file in right position? For example, in wordcount,
the input file cannot be divided at
thanks for your help
please i need more explanations on these:
* it is not too far away, network-wise
what do u mean network-wise?? what are the requirements of the
connection between the client and server? because i think that my
cluster is protected with a firewall
* the client hadoop
Yongqiang:
Thanks for this information. I'll try your changes and see if the experiment
runs better.
Thanks,
C G
--- On Mon, 7/7/08, heyongqiang [EMAIL PROTECTED] wrote:
From: heyongqiang [EMAIL PROTECTED]
Subject: Re: Re: Hadoop 0.17.0 - lots of I/O problems and can't run small
datasets?
My mapred program uses a custom library AlignmentLib.jar that is distributed
using DistributedCache. I'm testing my program in the pseudo distributed mode
in my workstation, but somehow it doesn't seem able to open the library. This
is the exception I got for the task (note the line in bold):
Hi
I am using code for a reader that must pass in a filename in order to create a
FileInputStream instance that uses the getChannel to read the file. I have to
use FileInputStream because it is processing image files and it's faster than
InputStream.
I can run this code locally, but when I
Oh, forgot to include the imagereader code that must take in the filename as an
argument. Note, this is not my code. I found it on the web to process images.
I'd like to use it within my hadoop job in my custom reader to process a chunk
and hand off to the mappers.
The imagereader code:
The next Hadoop User Group meeting is scheduled for July 22nd from 6 -
7:30 pm at Yahoo! Mission College, Building 1, Training Rooms 3 and 4.
Agenda:
Cascading - Chris Wenzel
Performance Benchmarking on Hadoop (Terabyte Sort, Gridmix) - Sameer
Paranjpye, Owen O'Malley, Runping Qi
After your formatting the namenode second time, your datanodes and namenode
may stay in inconsistency, namely, under imcompatible namespace.
On 7/2/08, Xuan Dzung Doan [EMAIL PROTECTED] wrote:
I was exactly following the Hadoop 0.16.4 quickstart guide to run a
Pseudo-distributed operation on
where i can find the Reverse-Index application?
heyongqiang
2008-07-09
发件人: Shengkai Zhu
发送时间: 2008-07-09 09:06:38
收件人: core-user@hadoop.apache.org
抄送:
主题: Re: modified word count example
Another Map Reduce application, Reverse-Index, behaviors similarly as you
description.
You can refer
heyongqiang 写道:
ipc.Client object is designed be able to share across threads, and each
thread can only made synchronized rpc call,which means each thread call and
wait for a result or error.This is implemented by a novel technique:each
thread made distinct call(with different call
It's an example M-R application in Phoenix coded in C.
I've no idea whether there's a popular hadoop version for it and I ported it
into hadoop-style application.
FYI. Src attached.
On 7/9/08, heyongqiang [EMAIL PROTECTED] wrote:
where i can find the Reverse-Index application?
heyongqiang
Actually this test result is a good result,it is just my misunderstanding of
the result.my mistake.
the second column actually is the average download rate per thread.And this
post test was run on one node,we also run test simultaneously on multiple
nodes,and the performance results seem
# bin/hadoop dfs -put conf input
08/06/29 09:38:42 INFO dfs.DFSClient:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /
user/root/input/hadoop-env.sh could only be replicated to 0 nodes,
instead of 1
Looks like your datanode didn't come up, anything in the logs?
You need to delete hadoop-root directory which has been created through DFS.
Usually hadoop creates this directory in /tmp/.
after deletion of the directory, just follow the instruction once again. It
will work.
2008/7/9 Arun C Murthy [EMAIL PROTECTED]:
# bin/hadoop dfs -put conf input
Hello,
On a cluster where I run Hadoop, it seems that the temp directory created by
Hadoop (in our case, /tmp/hadoop/) gets its permissions set to drwxrwxr-x
owned by the first person that runs a job after the Hadoop services are
started. This causes file permissions problems as we try to run
17 matches
Mail list logo