MapReduce with multi-languages

2008-07-08 Thread Taeho Kang
Dear Hadoop User Group, What are elegant ways to do mapred jobs on text-based data encoded with something other than UTF-8? It looks like Hadoop assumes the text data is always in UTF-8 and handles data that way - encoding with UTF-8 and decoding with UTF-8. And whenever the data is not in UTF-8

streaming problem

2008-07-08 Thread Andreas Kostyrka
Hi! I've noticed that streaming has big problems handling long lines, when streaming. In my special case the output of a reducer process takes very long time to run and sometimes crashes with a number of random effects, a Java OutOfMemory being the nicest one. (which is a fact. A reducer

Is map running in parallel?

2008-07-08 Thread hong
Hi! when we run map-reduce on hadoop, is the map running in a single node or running in parallel on several nodes? If it is running in parallel, the input file should be split. how can hadoop split input file in right position? For example, in wordcount, the input file cannot be divided at

Re: running hadoop remotely from inside a java program

2008-07-08 Thread Deyaa Adranale
thanks for your help please i need more explanations on these: * it is not too far away, network-wise what do u mean network-wise?? what are the requirements of the connection between the client and server? because i think that my cluster is protected with a firewall * the client hadoop

Re: Re: Hadoop 0.17.0 - lots of I/O problems and can't run small datasets?

2008-07-08 Thread C G
Yongqiang:   Thanks for this information.  I'll try your changes and see if the experiment runs better.   Thanks, C G --- On Mon, 7/7/08, heyongqiang [EMAIL PROTECTED] wrote: From: heyongqiang [EMAIL PROTECTED] Subject: Re: Re: Hadoop 0.17.0 - lots of I/O problems and can't run small datasets?

RemoteException - IOException on cached file

2008-07-08 Thread Xuan Dzung Doan
My mapred program uses a custom library AlignmentLib.jar that is distributed using DistributedCache. I'm testing my program in the pseudo distributed mode in my workstation, but somehow it doesn't seem able to open the library. This is the exception I got for the task (note the line in bold):

HDFS files

2008-07-08 Thread Kayla Jay
Hi I am using code for a reader that must pass in a filename in order to create a FileInputStream instance that uses the getChannel to read the file. I have to use FileInputStream because it is processing image files and it's faster than InputStream. I can run this code locally, but when I

Re: HDFS files

2008-07-08 Thread Kayla Jay
Oh, forgot to include the imagereader code that must take in the filename as an argument. Note, this is not my code. I found it on the web to process images. I'd like to use it within my hadoop job in my custom reader to process a chunk and hand off to the mappers. The imagereader code:

Monthly Hadoop User Group Meeting

2008-07-08 Thread Ajay Anand
The next Hadoop User Group meeting is scheduled for July 22nd from 6 - 7:30 pm at Yahoo! Mission College, Building 1, Training Rooms 3 and 4. Agenda: Cascading - Chris Wenzel Performance Benchmarking on Hadoop (Terabyte Sort, Gridmix) - Sameer Paranjpye, Owen O'Malley, Runping Qi

Re: Failed to repeat the Quickstart guide for Pseudo-distributed operation

2008-07-08 Thread Shengkai Zhu
After your formatting the namenode second time, your datanodes and namenode may stay in inconsistency, namely, under imcompatible namespace. On 7/2/08, Xuan Dzung Doan [EMAIL PROTECTED] wrote: I was exactly following the Hadoop 0.16.4 quickstart guide to run a Pseudo-distributed operation on

Re: Re: modified word count example

2008-07-08 Thread heyongqiang
where i can find the Reverse-Index application? heyongqiang 2008-07-09 发件人: Shengkai Zhu 发送时间: 2008-07-09 09:06:38 收件人: core-user@hadoop.apache.org 抄送: 主题: Re: modified word count example Another Map Reduce application, Reverse-Index, behaviors similarly as you description. You can refer

Re: hadoop download performace when user app adopt multi-thread

2008-07-08 Thread Samuel Guo
heyongqiang 写道: ipc.Client object is designed be able to share across threads, and each thread can only made synchronized rpc call,which means each thread call and wait for a result or error.This is implemented by a novel technique:each thread made distinct call(with different call

Re: Re: modified word count example

2008-07-08 Thread Shengkai Zhu
It's an example M-R application in Phoenix coded in C. I've no idea whether there's a popular hadoop version for it and I ported it into hadoop-style application. FYI. Src attached. On 7/9/08, heyongqiang [EMAIL PROTECTED] wrote: where i can find the Reverse-Index application? heyongqiang

Re: Re: hadoop download performace when user app adopt multi-thread

2008-07-08 Thread heyongqiang
Actually this test result is a good result,it is just my misunderstanding of the result.my mistake. the second column actually is the average download rate per thread.And this post test was run on one node,we also run test simultaneously on multiple nodes,and the performance results seem

Re: Failed to repeat the Quickstart guide for Pseudo-distributed operation

2008-07-08 Thread Arun C Murthy
# bin/hadoop dfs -put conf input 08/06/29 09:38:42 INFO dfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File / user/root/input/hadoop-env.sh could only be replicated to 0 nodes, instead of 1 Looks like your datanode didn't come up, anything in the logs?

Re: Failed to repeat the Quickstart guide for Pseudo-distributed operation

2008-07-08 Thread Deepak Diwakar
You need to delete hadoop-root directory which has been created through DFS. Usually hadoop creates this directory in /tmp/. after deletion of the directory, just follow the instruction once again. It will work. 2008/7/9 Arun C Murthy [EMAIL PROTECTED]: # bin/hadoop dfs -put conf input

File permissions issue

2008-07-08 Thread Joman Chu
Hello, On a cluster where I run Hadoop, it seems that the temp directory created by Hadoop (in our case, /tmp/hadoop/) gets its permissions set to drwxrwxr-x owned by the first person that runs a job after the Hadoop services are started. This causes file permissions problems as we try to run