yes. Once Suraj added merging of sorted files we can add this to the partitioner pretty easily.
2013/2/28 Edward J. Yoon <[email protected]> > Eh,..... btw, is re-partitioned data really necessary to be Sorted? > > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut > <[email protected]> wrote: > > Now I get how the partitioning works, obviously if you merge n sorted > files > > by just appending to each other, this will result in totally unsorted > data > > ;-) > > Why didn't you solve this via messaging? > > > > 2013/2/28 Thomas Jungblut <[email protected]> > > > >> Seems that they are not correctly sorted: > >> > >> vertexID: 50 > >> vertexID: 52 > >> vertexID: 54 > >> vertexID: 56 > >> vertexID: 58 > >> vertexID: 61 > >> ... > >> vertexID: 78 > >> vertexID: 81 > >> vertexID: 83 > >> vertexID: 85 > >> ... > >> vertexID: 94 > >> vertexID: 96 > >> vertexID: 98 > >> vertexID: 1 > >> vertexID: 10 > >> vertexID: 12 > >> vertexID: 14 > >> vertexID: 16 > >> vertexID: 18 > >> vertexID: 21 > >> vertexID: 23 > >> vertexID: 25 > >> vertexID: 27 > >> vertexID: 29 > >> vertexID: 3 > >> > >> So this won't work then correctly... > >> > >> > >> 2013/2/28 Thomas Jungblut <[email protected]> > >> > >>> sure, have fun on your holidays. > >>> > >>> > >>> 2013/2/28 Edward J. Yoon <[email protected]> > >>> > >>>> Sure, but if you can fix quickly, please do. March 1 is holiday[1] so > >>>> I'll appear next week. > >>>> > >>>> 1. http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea > >>>> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas Jungblut > >>>> <[email protected]> wrote: > >>>> > Maybe 50 is missing from the file, didn't observe if all items were > >>>> added. > >>>> > As far as I remember, I copy/pasted the logic of the ID into the > >>>> fastgen, > >>>> > want to have a look into it? > >>>> > > >>>> > 2013/2/28 Edward J. Yoon <[email protected]> > >>>> > > >>>> >> I guess, it's a bug of fastgen, when generate adjacency matrix into > >>>> >> multiple files. > >>>> >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas Jungblut > >>>> >> <[email protected]> wrote: > >>>> >> > You have two files, are they partitioned correctly? > >>>> >> > > >>>> >> > 2013/2/28 Edward J. Yoon <[email protected]> > >>>> >> > > >>>> >> >> It looks like a bug. > >>>> >> >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al /tmp/randomgraph/ > >>>> >> >> total 44 > >>>> >> >> drwxrwxr-x 3 edward edward 4096 2월 28 18:03 . > >>>> >> >> drwxrwxrwt 19 root root 20480 2월 28 18:04 .. > >>>> >> >> -rwxrwxrwx 1 edward edward 2243 2월 28 18:01 part-00000 > >>>> >> >> -rw-rw-r-- 1 edward edward 28 2월 28 18:01 .part-00000.crc > >>>> >> >> -rwxrwxrwx 1 edward edward 2251 2월 28 18:01 part-00001 > >>>> >> >> -rw-rw-r-- 1 edward edward 28 2월 28 18:01 .part-00001.crc > >>>> >> >> drwxrwxr-x 2 edward edward 4096 2월 28 18:03 partitions > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al > >>>> >> /tmp/randomgraph/partitions/ > >>>> >> >> total 24 > >>>> >> >> drwxrwxr-x 2 edward edward 4096 2월 28 18:03 . > >>>> >> >> drwxrwxr-x 3 edward edward 4096 2월 28 18:03 .. > >>>> >> >> -rwxrwxrwx 1 edward edward 2932 2월 28 18:03 part-00000 > >>>> >> >> -rw-rw-r-- 1 edward edward 32 2월 28 18:03 .part-00000.crc > >>>> >> >> -rwxrwxrwx 1 edward edward 2955 2월 28 18:03 part-00001 > >>>> >> >> -rw-rw-r-- 1 edward edward 32 2월 28 18:03 .part-00001.crc > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ > >>>> >> >> > >>>> >> >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward <[email protected]> > wrote: > >>>> >> >> > yes i'll check again > >>>> >> >> > > >>>> >> >> > Sent from my iPhone > >>>> >> >> > > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas Jungblut < > >>>> >> [email protected]> > >>>> >> >> wrote: > >>>> >> >> > > >>>> >> >> >> Can you verify an observation for me please? > >>>> >> >> >> > >>>> >> >> >> 2 files are created from fastgen, part-00000 and part-00001, > >>>> both > >>>> >> ~2.2kb > >>>> >> >> >> sized. > >>>> >> >> >> In the below partition directory, there is only a single > 5.56kb > >>>> file. > >>>> >> >> >> > >>>> >> >> >> Is it intended for the partitioner to write a single file if > you > >>>> >> >> configured > >>>> >> >> >> two? > >>>> >> >> >> It even reads it as a two files, strange huh? > >>>> >> >> >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <[email protected]> > >>>> >> >> >> > >>>> >> >> >>> Will have a look into it. > >>>> >> >> >>> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph 1 > >>>> >> >> >>> pagerank /tmp/randomgraph /tmp/pageout > >>>> >> >> >>> > >>>> >> >> >>> did work for me the last time I profiled, maybe the > >>>> partitioning > >>>> >> >> doesn't > >>>> >> >> >>> partition correctly with the input or something else. > >>>> >> >> >>> > >>>> >> >> >>> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <[email protected]> > >>>> >> >> >>> > >>>> >> >> >>> Fastgen input seems not work for graph examples. > >>>> >> >> >>>> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$ bin/hama > jar > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen > fastgen > >>>> 100 10 > >>>> >> >> >>>> /tmp/randomgraph 2 > >>>> >> >> >>>> 13/02/28 10:32:02 WARN util.NativeCodeLoader: Unable to > load > >>>> >> >> >>>> native-hadoop library for your platform... using > builtin-java > >>>> >> classes > >>>> >> >> >>>> where applicable > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.BSPJobClient: Running job: > >>>> >> >> job_localrunner_0001 > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.LocalBSPRunner: Setting up a new > >>>> barrier > >>>> >> >> for 2 > >>>> >> >> >>>> tasks! > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: Current supersteps > >>>> >> number: 0 > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: The total number > of > >>>> >> >> supersteps: 0 > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: Counters: 3 > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: SUPERSTEPS=0 > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: > LAUNCHED_TASKS=2 > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: > >>>> >> TASK_OUTPUT_RECORDS=100 > >>>> >> >> >>>> Job Finished in 3.212 seconds > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$ bin/hama > jar > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$ bin/hama > jar > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar pagerank > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour > >>>> >> >> >>>> 13/02/28 10:32:29 WARN util.NativeCodeLoader: Unable to > load > >>>> >> >> >>>> native-hadoop library for your platform... using > builtin-java > >>>> >> classes > >>>> >> >> >>>> where applicable > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat: Total input > paths > >>>> to > >>>> >> >> process > >>>> >> >> >>>> : 2 > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat: Total input > paths > >>>> to > >>>> >> >> process > >>>> >> >> >>>> : 2 > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.BSPJobClient: Running job: > >>>> >> >> job_localrunner_0001 > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.LocalBSPRunner: Setting up a new > >>>> barrier > >>>> >> >> for 2 > >>>> >> >> >>>> tasks! > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Current supersteps > >>>> >> number: 1 > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: The total number > of > >>>> >> >> supersteps: 1 > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Counters: 6 > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: SUPERSTEPS=1 > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: > LAUNCHED_TASKS=2 > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: > SUPERSTEP_SUM=4 > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: > >>>> IO_BYTES_READ=4332 > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: > >>>> TIME_IN_SYNC_MS=14 > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: > >>>> TASK_INPUT_RECORDS=100 > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.FileInputFormat: Total input > paths > >>>> to > >>>> >> >> process > >>>> >> >> >>>> : 2 > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Running job: > >>>> >> >> job_localrunner_0001 > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.LocalBSPRunner: Setting up a new > >>>> barrier > >>>> >> >> for 2 > >>>> >> >> >>>> tasks! > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50 vertices > are > >>>> loaded > >>>> >> >> into > >>>> >> >> >>>> local:1 > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50 vertices > are > >>>> loaded > >>>> >> >> into > >>>> >> >> >>>> local:0 > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR bsp.LocalBSPRunner: Exception > during > >>>> BSP > >>>> >> >> >>>> execution! > >>>> >> >> >>>> java.lang.IllegalArgumentException: Messages must never be > >>>> behind > >>>> >> the > >>>> >> >> >>>> vertex in ID! Current Message ID: 1 vs. 50 > >>>> >> >> >>>> at > >>>> >> >> >>>> > >>>> >> > org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279) > >>>> >> >> >>>> at > >>>> >> >> >>>> > >>>> >> >> > >>>> >> > >>>> > org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225) > >>>> >> >> >>>> at > >>>> >> >> >>>> > >>>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129) > >>>> >> >> >>>> at > >>>> >> >> >>>> > >>>> >> >> > >>>> >> > >>>> > org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256) > >>>> >> >> >>>> at > >>>> >> >> >>>> > >>>> >> >> > >>>> >> > >>>> > org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286) > >>>> >> >> >>>> at > >>>> >> >> >>>> > >>>> >> >> > >>>> >> > >>>> > org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211) > >>>> >> >> >>>> at > >>>> >> >> >>>> > >>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > >>>> >> >> >>>> at > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166) > >>>> >> >> >>>> at > >>>> >> >> >>>> > >>>> >> >> > >>>> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > >>>> >> >> >>>> at > >>>> >> >> >>>> > >>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > >>>> >> >> >>>> at > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166) > >>>> >> >> >>>> at > >>>> >> >> >>>> > >>>> >> >> > >>>> >> > >>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > >>>> >> >> >>>> at > >>>> >> >> >>>> > >>>> >> >> > >>>> >> > >>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > >>>> >> >> >>>> at java.lang.Thread.run(Thread.java:722) > >>>> >> >> >>>> > >>>> >> >> >>>> > >>>> >> >> >>>> -- > >>>> >> >> >>>> Best Regards, Edward J. Yoon > >>>> >> >> >>>> @eddieyoon > >>>> >> >> >>> > >>>> >> >> >>> > >>>> >> >> > >>>> >> >> > >>>> >> >> > >>>> >> >> -- > >>>> >> >> Best Regards, Edward J. Yoon > >>>> >> >> @eddieyoon > >>>> >> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> -- > >>>> >> Best Regards, Edward J. Yoon > >>>> >> @eddieyoon > >>>> >> > >>>> > >>>> > >>>> > >>>> -- > >>>> Best Regards, Edward J. Yoon > >>>> @eddieyoon > >>>> > >>> > >>> > >> > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon >
