Now I get how the partitioning works, obviously if you merge n sorted files by just appending to each other, this will result in totally unsorted data ;-) Why didn't you solve this via messaging?
2013/2/28 Thomas Jungblut <[email protected]> > Seems that they are not correctly sorted: > > vertexID: 50 > vertexID: 52 > vertexID: 54 > vertexID: 56 > vertexID: 58 > vertexID: 61 > ... > vertexID: 78 > vertexID: 81 > vertexID: 83 > vertexID: 85 > ... > vertexID: 94 > vertexID: 96 > vertexID: 98 > vertexID: 1 > vertexID: 10 > vertexID: 12 > vertexID: 14 > vertexID: 16 > vertexID: 18 > vertexID: 21 > vertexID: 23 > vertexID: 25 > vertexID: 27 > vertexID: 29 > vertexID: 3 > > So this won't work then correctly... > > > 2013/2/28 Thomas Jungblut <[email protected]> > >> sure, have fun on your holidays. >> >> >> 2013/2/28 Edward J. Yoon <[email protected]> >> >>> Sure, but if you can fix quickly, please do. March 1 is holiday[1] so >>> I'll appear next week. >>> >>> 1. http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea >>> >>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas Jungblut >>> <[email protected]> wrote: >>> > Maybe 50 is missing from the file, didn't observe if all items were >>> added. >>> > As far as I remember, I copy/pasted the logic of the ID into the >>> fastgen, >>> > want to have a look into it? >>> > >>> > 2013/2/28 Edward J. Yoon <[email protected]> >>> > >>> >> I guess, it's a bug of fastgen, when generate adjacency matrix into >>> >> multiple files. >>> >> >>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas Jungblut >>> >> <[email protected]> wrote: >>> >> > You have two files, are they partitioned correctly? >>> >> > >>> >> > 2013/2/28 Edward J. Yoon <[email protected]> >>> >> > >>> >> >> It looks like a bug. >>> >> >> >>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al /tmp/randomgraph/ >>> >> >> total 44 >>> >> >> drwxrwxr-x 3 edward edward 4096 2월 28 18:03 . >>> >> >> drwxrwxrwt 19 root root 20480 2월 28 18:04 .. >>> >> >> -rwxrwxrwx 1 edward edward 2243 2월 28 18:01 part-00000 >>> >> >> -rw-rw-r-- 1 edward edward 28 2월 28 18:01 .part-00000.crc >>> >> >> -rwxrwxrwx 1 edward edward 2251 2월 28 18:01 part-00001 >>> >> >> -rw-rw-r-- 1 edward edward 28 2월 28 18:01 .part-00001.crc >>> >> >> drwxrwxr-x 2 edward edward 4096 2월 28 18:03 partitions >>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al >>> >> /tmp/randomgraph/partitions/ >>> >> >> total 24 >>> >> >> drwxrwxr-x 2 edward edward 4096 2월 28 18:03 . >>> >> >> drwxrwxr-x 3 edward edward 4096 2월 28 18:03 .. >>> >> >> -rwxrwxrwx 1 edward edward 2932 2월 28 18:03 part-00000 >>> >> >> -rw-rw-r-- 1 edward edward 32 2월 28 18:03 .part-00000.crc >>> >> >> -rwxrwxrwx 1 edward edward 2955 2월 28 18:03 part-00001 >>> >> >> -rw-rw-r-- 1 edward edward 32 2월 28 18:03 .part-00001.crc >>> >> >> edward@udanax:~/workspace/hama-trunk$ >>> >> >> >>> >> >> >>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward <[email protected]> wrote: >>> >> >> > yes i'll check again >>> >> >> > >>> >> >> > Sent from my iPhone >>> >> >> > >>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas Jungblut < >>> >> [email protected]> >>> >> >> wrote: >>> >> >> > >>> >> >> >> Can you verify an observation for me please? >>> >> >> >> >>> >> >> >> 2 files are created from fastgen, part-00000 and part-00001, >>> both >>> >> ~2.2kb >>> >> >> >> sized. >>> >> >> >> In the below partition directory, there is only a single 5.56kb >>> file. >>> >> >> >> >>> >> >> >> Is it intended for the partitioner to write a single file if you >>> >> >> configured >>> >> >> >> two? >>> >> >> >> It even reads it as a two files, strange huh? >>> >> >> >> >>> >> >> >> 2013/2/28 Thomas Jungblut <[email protected]> >>> >> >> >> >>> >> >> >>> Will have a look into it. >>> >> >> >>> >>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph 1 >>> >> >> >>> pagerank /tmp/randomgraph /tmp/pageout >>> >> >> >>> >>> >> >> >>> did work for me the last time I profiled, maybe the >>> partitioning >>> >> >> doesn't >>> >> >> >>> partition correctly with the input or something else. >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> 2013/2/28 Edward J. Yoon <[email protected]> >>> >> >> >>> >>> >> >> >>> Fastgen input seems not work for graph examples. >>> >> >> >>>> >>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$ bin/hama jar >>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen fastgen >>> 100 10 >>> >> >> >>>> /tmp/randomgraph 2 >>> >> >> >>>> 13/02/28 10:32:02 WARN util.NativeCodeLoader: Unable to load >>> >> >> >>>> native-hadoop library for your platform... using builtin-java >>> >> classes >>> >> >> >>>> where applicable >>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.BSPJobClient: Running job: >>> >> >> job_localrunner_0001 >>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.LocalBSPRunner: Setting up a new >>> barrier >>> >> >> for 2 >>> >> >> >>>> tasks! >>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: Current supersteps >>> >> number: 0 >>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: The total number of >>> >> >> supersteps: 0 >>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: Counters: 3 >>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: >>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter >>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: SUPERSTEPS=0 >>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: LAUNCHED_TASKS=2 >>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: >>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter >>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: >>> >> TASK_OUTPUT_RECORDS=100 >>> >> >> >>>> Job Finished in 3.212 seconds >>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$ bin/hama jar >>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT >>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar >>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar >>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$ bin/hama jar >>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar pagerank >>> >> >> >>>> /tmp/randomgraph /tmp/pageour >>> >> >> >>>> 13/02/28 10:32:29 WARN util.NativeCodeLoader: Unable to load >>> >> >> >>>> native-hadoop library for your platform... using builtin-java >>> >> classes >>> >> >> >>>> where applicable >>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat: Total input paths >>> to >>> >> >> process >>> >> >> >>>> : 2 >>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat: Total input paths >>> to >>> >> >> process >>> >> >> >>>> : 2 >>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.BSPJobClient: Running job: >>> >> >> job_localrunner_0001 >>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.LocalBSPRunner: Setting up a new >>> barrier >>> >> >> for 2 >>> >> >> >>>> tasks! >>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Current supersteps >>> >> number: 1 >>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: The total number of >>> >> >> supersteps: 1 >>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Counters: 6 >>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: >>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter >>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: SUPERSTEPS=1 >>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: LAUNCHED_TASKS=2 >>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: >>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter >>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: SUPERSTEP_SUM=4 >>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: >>> IO_BYTES_READ=4332 >>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: >>> TIME_IN_SYNC_MS=14 >>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: >>> TASK_INPUT_RECORDS=100 >>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.FileInputFormat: Total input paths >>> to >>> >> >> process >>> >> >> >>>> : 2 >>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Running job: >>> >> >> job_localrunner_0001 >>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.LocalBSPRunner: Setting up a new >>> barrier >>> >> >> for 2 >>> >> >> >>>> tasks! >>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50 vertices are >>> loaded >>> >> >> into >>> >> >> >>>> local:1 >>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50 vertices are >>> loaded >>> >> >> into >>> >> >> >>>> local:0 >>> >> >> >>>> 13/02/28 10:32:33 ERROR bsp.LocalBSPRunner: Exception during >>> BSP >>> >> >> >>>> execution! >>> >> >> >>>> java.lang.IllegalArgumentException: Messages must never be >>> behind >>> >> the >>> >> >> >>>> vertex in ID! Current Message ID: 1 vs. 50 >>> >> >> >>>> at >>> >> >> >>>> >>> >> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279) >>> >> >> >>>> at >>> >> >> >>>> >>> >> >> >>> >> >>> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225) >>> >> >> >>>> at >>> >> >> >>>> >>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129) >>> >> >> >>>> at >>> >> >> >>>> >>> >> >> >>> >> >>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256) >>> >> >> >>>> at >>> >> >> >>>> >>> >> >> >>> >> >>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286) >>> >> >> >>>> at >>> >> >> >>>> >>> >> >> >>> >> >>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211) >>> >> >> >>>> at >>> >> >> >>>> >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >>> >> >> >>>> at >>> java.util.concurrent.FutureTask.run(FutureTask.java:166) >>> >> >> >>>> at >>> >> >> >>>> >>> >> >> >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>> >> >> >>>> at >>> >> >> >>>> >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >>> >> >> >>>> at >>> java.util.concurrent.FutureTask.run(FutureTask.java:166) >>> >> >> >>>> at >>> >> >> >>>> >>> >> >> >>> >> >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >>> >> >> >>>> at >>> >> >> >>>> >>> >> >> >>> >> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >>> >> >> >>>> at java.lang.Thread.run(Thread.java:722) >>> >> >> >>>> >>> >> >> >>>> >>> >> >> >>>> -- >>> >> >> >>>> Best Regards, Edward J. Yoon >>> >> >> >>>> @eddieyoon >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >> >> >>> >> >> >>> >> >> -- >>> >> >> Best Regards, Edward J. Yoon >>> >> >> @eddieyoon >>> >> >> >>> >> >>> >> >>> >> >>> >> -- >>> >> Best Regards, Edward J. Yoon >>> >> @eddieyoon >>> >> >>> >>> >>> >>> -- >>> Best Regards, Edward J. Yoon >>> @eddieyoon >>> >> >> >
