Yeah. . .you are having to modify scripts (not the best solution). Using the distributed cache is way more flexible since you can put your jars wherever you want (on hdfs). And you don't have to change environment stuff. Please lemme know if you have any other questions.
-David From: yavuz gokirmak <ygokir...@gmail.com<mailto:ygokir...@gmail.com>> Reply-To: "giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>" <giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>> Date: Mon, 20 Feb 2012 00:40:41 -0600 To: "giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>" <giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>> Subject: Re: how to use SimplePageRankVertex Thank you, I will try distributed cache. When I use distributed cache, the patches I have written will be unnecassary ? On 20 February 2012 03:44, David Garcia <dgar...@potomacfusion.com<mailto:dgar...@potomacfusion.com>> wrote: so, if that's the case, it's possible that the Tasktracker process doesn't have the job on it's classpath. Although you have added the jar to "a" classpath, I'm not certain that the Tasktracker will have it. There are several ways to address this. 1.) you could bring Hadoop down, and then adjust hadoop-env.sh to export the HADOOP_CLASSPATH environment variable to include your jar. This variable is commented out by default. If you are running in distributed mode, this means that you will have to copy this jar to ever single machine...and probably change this script on every single machine too...unless you are using something like condor (or puppet if you're hard core serious), this is a serious pain...and for changing MR jobs, totally overkill. My personal preference is to use the Distributed cache, and copy your jar to a location in hdfs: http://hadoop.apache.org/common/docs/r0.18.3/mapred_tutorial.html#DistributedCache hope this helps. ________________________________________ From: yavuz gokirmak [ygokir...@gmail.com<mailto:ygokir...@gmail.com>] Sent: Sunday, February 19, 2012 2:19 AM To: giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org> Subject: Re: how to use SimplePageRankVertex I am using pseudo distribudet cluster On 19 February 2012 02:00, David Garcia <dgar...@potomacfusion.com<mailto:dgar...@potomacfusion.com><mailto:dgar...@potomacfusion.com<mailto:dgar...@potomacfusion.com>>> wrote: Are you submitting this job to a pseudo distributed cluster or a fully distributed cluster? Sent from my HTC Inspire⢠4G on AT&T ----- Reply message ----- From: "yavuz gokirmak" <ygokir...@gmail.com<mailto:ygokir...@gmail.com><mailto:ygokir...@gmail.com<mailto:ygokir...@gmail.com>>> To: "giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org><mailto:giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>>" <giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org><mailto:giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>>> Subject: how to use SimplePageRankVertex Date: Sat, Feb 18, 2012 2:04 pm Thank you for advices, I have a few more questions. I have created a class named INTPageRankVertex which is similar to SimplePageRankVertex and generated a jar holding only INTPageRankVertex.java. Later, try to run with giraph command as below but get classpath errors: giraph INTPageRankVertex.jar org.test.INTPageRankVertex \ -ip /user/hdfs/pagerankinput/graph.input \ -op /user/hdfs/pagerankoutput/ \ -w 1 \ -if org.test.INTPageRankVertex.INTPageRankVertexInputFormat \ -of org.test.INTPageRankVertex.INTPageRankVertexOutputFormat \ First I get, Exception in thread "main" java.lang.ClassNotFoundException: org.test.INTPageRankVertex in bin/giraph user jar is added to classpath on line 58 58. CLASSPATH=${USER_JAR} but CLASSPATH is overwritten on line 87 87. CLASSPATH=`mvn dependency:build-classpath | grep -v "[INFO]"` changing line 87 as below solves my first problem. Does this patch is valid? 87. CLASSPATH=$CLASSPATH:`mvn dependency:build-classpath | grep -v "[INFO]"` After changing line 87 I get a different classpath error: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/giraph/graph/LongDoubleFloatDoubleVertex And I solved this problem by adding below line 113. CLASSPATH=$CLASSPATH:$JAR Does these patches are necessary or I am doing something wrong while running my code.. best regards.. On 18 February 2012 18:37, Avery Ching <ach...@apache.org<mailto:ach...@apache.org><mailto:ach...@apache.org<mailto:ach...@apache.org>>> wrote: IntIntNullIntTextInputFormat in the examples package (extending TextVertexInputFormat as David suggests) is very similar to what you need I think, although the types might be different for your application. You can start with that perhaps. Avery On 2/18/12 7:48 AM, David Garcia wrote: The easiest thing to do is to extend text vertex or/and textvertext input format and/or the record reader. The record reader will give you the vertices you want. Look at the record reader for textvertexinputformat. It's an innerclass on this format class. Sent from my HTC Inspire⢠4G on AT&T ----- Reply message ----- From: "yavuz gokirmak" <ygokir...@gmail.com<mailto:ygokir...@gmail.com>><mailto:ygokir...@gmail.com<mailto:ygokir...@gmail.com>> To: "giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>"<mailto:giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>> <giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>><mailto:giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>> Subject: how to use SimplePageRankVertex Date: Sat, Feb 18, 2012 9:08 am Hi, I am planning to use giraph for network analysis. First I am trying to fully understand SimplePageRankVertex implementation and modify in order to serve my needs. I have a question about example, What is the expected input format for SimplePageRankVertex, I couldn't understand the input format although SimplePageRankVertexReader class has few lines. My input file is contains of rows such as: usera, userb usera, userc userc, usera userb, userc userc, userb . . . Each row represents a relation between two users, "usera,userb" means that "usera is clicked userb's profile" Is it possible to make social network analysis over such kind of data using giraph? I will be glad if you can give advices.. thanks in advance best regards ygokirmak