No, the partitions we write locally need not be sorted. Sorry for the confusion. The Superstep injection is possible with Superstep API. There are few enhancements needed to make it simpler after I last worked on it. We can then look into partitioning superstep being executed before the setup of first superstep of submitted job. I think it is feasible.
On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <[email protected]>wrote: > > spilling queue and sorted spilling queue, can we inject the partitioning > > superstep as the first superstep and use local memory? > > Actually, I wanted to add something before calling BSP.setup() method > to avoid execute additional BSP job. But, in my opinion, current is > enough. I think, we need to collect more experiences of input > partitioning on large environments. I'll do. > > BTW, I still don't know why it need to be Sorted?! MR-like? > > On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <[email protected]> > wrote: > > Sorry, I am increasing the scope here to outside graph module. When we > have > > spilling queue and sorted spilling queue, can we inject the partitioning > > superstep as the first superstep and use local memory? > > Today we have partitioning job within a job and are creating two copies > of > > data on HDFS. This could be really costly. Is it possible to create or > > redistribute the partitions on local memory and initialize the record > > reader there? > > The user can run a separate job give in examples area to explicitly > > repartition the data on HDFS. The deployment question is how much of disk > > space gets allocated for local memory usage? Would it be a safe approach > > with the limitations? > > > > -Suraj > > > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut > > <[email protected]>wrote: > > > >> yes. Once Suraj added merging of sorted files we can add this to the > >> partitioner pretty easily. > >> > >> 2013/2/28 Edward J. Yoon <[email protected]> > >> > >> > Eh,..... btw, is re-partitioned data really necessary to be Sorted? > >> > > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut > >> > <[email protected]> wrote: > >> > > Now I get how the partitioning works, obviously if you merge n > sorted > >> > files > >> > > by just appending to each other, this will result in totally > unsorted > >> > data > >> > > ;-) > >> > > Why didn't you solve this via messaging? > >> > > > >> > > 2013/2/28 Thomas Jungblut <[email protected]> > >> > > > >> > >> Seems that they are not correctly sorted: > >> > >> > >> > >> vertexID: 50 > >> > >> vertexID: 52 > >> > >> vertexID: 54 > >> > >> vertexID: 56 > >> > >> vertexID: 58 > >> > >> vertexID: 61 > >> > >> ... > >> > >> vertexID: 78 > >> > >> vertexID: 81 > >> > >> vertexID: 83 > >> > >> vertexID: 85 > >> > >> ... > >> > >> vertexID: 94 > >> > >> vertexID: 96 > >> > >> vertexID: 98 > >> > >> vertexID: 1 > >> > >> vertexID: 10 > >> > >> vertexID: 12 > >> > >> vertexID: 14 > >> > >> vertexID: 16 > >> > >> vertexID: 18 > >> > >> vertexID: 21 > >> > >> vertexID: 23 > >> > >> vertexID: 25 > >> > >> vertexID: 27 > >> > >> vertexID: 29 > >> > >> vertexID: 3 > >> > >> > >> > >> So this won't work then correctly... > >> > >> > >> > >> > >> > >> 2013/2/28 Thomas Jungblut <[email protected]> > >> > >> > >> > >>> sure, have fun on your holidays. > >> > >>> > >> > >>> > >> > >>> 2013/2/28 Edward J. Yoon <[email protected]> > >> > >>> > >> > >>>> Sure, but if you can fix quickly, please do. March 1 is > holiday[1] > >> so > >> > >>>> I'll appear next week. > >> > >>>> > >> > >>>> 1. http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea > >> > >>>> > >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas Jungblut > >> > >>>> <[email protected]> wrote: > >> > >>>> > Maybe 50 is missing from the file, didn't observe if all items > >> were > >> > >>>> added. > >> > >>>> > As far as I remember, I copy/pasted the logic of the ID into > the > >> > >>>> fastgen, > >> > >>>> > want to have a look into it? > >> > >>>> > > >> > >>>> > 2013/2/28 Edward J. Yoon <[email protected]> > >> > >>>> > > >> > >>>> >> I guess, it's a bug of fastgen, when generate adjacency matrix > >> into > >> > >>>> >> multiple files. > >> > >>>> >> > >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas Jungblut > >> > >>>> >> <[email protected]> wrote: > >> > >>>> >> > You have two files, are they partitioned correctly? > >> > >>>> >> > > >> > >>>> >> > 2013/2/28 Edward J. Yoon <[email protected]> > >> > >>>> >> > > >> > >>>> >> >> It looks like a bug. > >> > >>>> >> >> > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al > >> /tmp/randomgraph/ > >> > >>>> >> >> total 44 > >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096 2월 28 18:03 . > >> > >>>> >> >> drwxrwxrwt 19 root root 20480 2월 28 18:04 .. > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2243 2월 28 18:01 part-00000 > >> > >>>> >> >> -rw-rw-r-- 1 edward edward 28 2월 28 18:01 > .part-00000.crc > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2251 2월 28 18:01 part-00001 > >> > >>>> >> >> -rw-rw-r-- 1 edward edward 28 2월 28 18:01 > .part-00001.crc > >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096 2월 28 18:03 partitions > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al > >> > >>>> >> /tmp/randomgraph/partitions/ > >> > >>>> >> >> total 24 > >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096 2월 28 18:03 . > >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096 2월 28 18:03 .. > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932 2월 28 18:03 part-00000 > >> > >>>> >> >> -rw-rw-r-- 1 edward edward 32 2월 28 18:03 > .part-00000.crc > >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955 2월 28 18:03 part-00001 > >> > >>>> >> >> -rw-rw-r-- 1 edward edward 32 2월 28 18:03 > .part-00001.crc > >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ > >> > >>>> >> >> > >> > >>>> >> >> > >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward <[email protected] > > > >> > wrote: > >> > >>>> >> >> > yes i'll check again > >> > >>>> >> >> > > >> > >>>> >> >> > Sent from my iPhone > >> > >>>> >> >> > > >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas Jungblut < > >> > >>>> >> [email protected]> > >> > >>>> >> >> wrote: > >> > >>>> >> >> > > >> > >>>> >> >> >> Can you verify an observation for me please? > >> > >>>> >> >> >> > >> > >>>> >> >> >> 2 files are created from fastgen, part-00000 and > >> part-00001, > >> > >>>> both > >> > >>>> >> ~2.2kb > >> > >>>> >> >> >> sized. > >> > >>>> >> >> >> In the below partition directory, there is only a single > >> > 5.56kb > >> > >>>> file. > >> > >>>> >> >> >> > >> > >>>> >> >> >> Is it intended for the partitioner to write a single > file > >> if > >> > you > >> > >>>> >> >> configured > >> > >>>> >> >> >> two? > >> > >>>> >> >> >> It even reads it as a two files, strange huh? > >> > >>>> >> >> >> > >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <[email protected]> > >> > >>>> >> >> >> > >> > >>>> >> >> >>> Will have a look into it. > >> > >>>> >> >> >>> > >> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph 1 > >> > >>>> >> >> >>> pagerank /tmp/randomgraph /tmp/pageout > >> > >>>> >> >> >>> > >> > >>>> >> >> >>> did work for me the last time I profiled, maybe the > >> > >>>> partitioning > >> > >>>> >> >> doesn't > >> > >>>> >> >> >>> partition correctly with the input or something else. > >> > >>>> >> >> >>> > >> > >>>> >> >> >>> > >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <[email protected]> > >> > >>>> >> >> >>> > >> > >>>> >> >> >>> Fastgen input seems not work for graph examples. > >> > >>>> >> >> >>>> > >> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$ > >> bin/hama > >> > jar > >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen > >> > fastgen > >> > >>>> 100 10 > >> > >>>> >> >> >>>> /tmp/randomgraph 2 > >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN util.NativeCodeLoader: Unable > to > >> > load > >> > >>>> >> >> >>>> native-hadoop library for your platform... using > >> > builtin-java > >> > >>>> >> classes > >> > >>>> >> >> >>>> where applicable > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.BSPJobClient: Running job: > >> > >>>> >> >> job_localrunner_0001 > >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.LocalBSPRunner: Setting up > a > >> new > >> > >>>> barrier > >> > >>>> >> >> for 2 > >> > >>>> >> >> >>>> tasks! > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: Current > >> supersteps > >> > >>>> >> number: 0 > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: The total > number > >> > of > >> > >>>> >> >> supersteps: 0 > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: Counters: 3 > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: > >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: > SUPERSTEPS=0 > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: > >> > LAUNCHED_TASKS=2 > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: > >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter > >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: > >> > >>>> >> TASK_OUTPUT_RECORDS=100 > >> > >>>> >> >> >>>> Job Finished in 3.212 seconds > >> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$ > >> bin/hama > >> > jar > >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar > >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar > >> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$ > >> bin/hama > >> > jar > >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar > pagerank > >> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour > >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN util.NativeCodeLoader: Unable > to > >> > load > >> > >>>> >> >> >>>> native-hadoop library for your platform... using > >> > builtin-java > >> > >>>> >> classes > >> > >>>> >> >> >>>> where applicable > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat: Total > input > >> > paths > >> > >>>> to > >> > >>>> >> >> process > >> > >>>> >> >> >>>> : 2 > >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat: Total > input > >> > paths > >> > >>>> to > >> > >>>> >> >> process > >> > >>>> >> >> >>>> : 2 > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.BSPJobClient: Running job: > >> > >>>> >> >> job_localrunner_0001 > >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.LocalBSPRunner: Setting up > a > >> new > >> > >>>> barrier > >> > >>>> >> >> for 2 > >> > >>>> >> >> >>>> tasks! > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Current > >> supersteps > >> > >>>> >> number: 1 > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: The total > number > >> > of > >> > >>>> >> >> supersteps: 1 > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Counters: 6 > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: > >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: > SUPERSTEPS=1 > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: > >> > LAUNCHED_TASKS=2 > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: > >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: > >> > SUPERSTEP_SUM=4 > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: > >> > >>>> IO_BYTES_READ=4332 > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: > >> > >>>> TIME_IN_SYNC_MS=14 > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: > >> > >>>> TASK_INPUT_RECORDS=100 > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.FileInputFormat: Total > input > >> > paths > >> > >>>> to > >> > >>>> >> >> process > >> > >>>> >> >> >>>> : 2 > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Running job: > >> > >>>> >> >> job_localrunner_0001 > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.LocalBSPRunner: Setting up > a > >> new > >> > >>>> barrier > >> > >>>> >> >> for 2 > >> > >>>> >> >> >>>> tasks! > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50 > vertices > >> > are > >> > >>>> loaded > >> > >>>> >> >> into > >> > >>>> >> >> >>>> local:1 > >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50 > vertices > >> > are > >> > >>>> loaded > >> > >>>> >> >> into > >> > >>>> >> >> >>>> local:0 > >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR bsp.LocalBSPRunner: Exception > >> > during > >> > >>>> BSP > >> > >>>> >> >> >>>> execution! > >> > >>>> >> >> >>>> java.lang.IllegalArgumentException: Messages must > never > >> be > >> > >>>> behind > >> > >>>> >> the > >> > >>>> >> >> >>>> vertex in ID! Current Message ID: 1 vs. 50 > >> > >>>> >> >> >>>> at > >> > >>>> >> >> >>>> > >> > >>>> >> > >> > org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279) > >> > >>>> >> >> >>>> at > >> > >>>> >> >> >>>> > >> > >>>> >> >> > >> > >>>> >> > >> > >>>> > >> > > org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225) > >> > >>>> >> >> >>>> at > >> > >>>> >> >> >>>> > >> > >>>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129) > >> > >>>> >> >> >>>> at > >> > >>>> >> >> >>>> > >> > >>>> >> >> > >> > >>>> >> > >> > >>>> > >> > > org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256) > >> > >>>> >> >> >>>> at > >> > >>>> >> >> >>>> > >> > >>>> >> >> > >> > >>>> >> > >> > >>>> > >> > > >> > org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286) > >> > >>>> >> >> >>>> at > >> > >>>> >> >> >>>> > >> > >>>> >> >> > >> > >>>> >> > >> > >>>> > >> > > >> > org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211) > >> > >>>> >> >> >>>> at > >> > >>>> >> >> >>>> > >> > >>>> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > >> > >>>> >> >> >>>> at > >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166) > >> > >>>> >> >> >>>> at > >> > >>>> >> >> >>>> > >> > >>>> >> >> > >> > >>>> > >> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > >> > >>>> >> >> >>>> at > >> > >>>> >> >> >>>> > >> > >>>> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > >> > >>>> >> >> >>>> at > >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166) > >> > >>>> >> >> >>>> at > >> > >>>> >> >> >>>> > >> > >>>> >> >> > >> > >>>> >> > >> > >>>> > >> > > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > >> > >>>> >> >> >>>> at > >> > >>>> >> >> >>>> > >> > >>>> >> >> > >> > >>>> >> > >> > >>>> > >> > > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > >> > >>>> >> >> >>>> at java.lang.Thread.run(Thread.java:722) > >> > >>>> >> >> >>>> > >> > >>>> >> >> >>>> > >> > >>>> >> >> >>>> -- > >> > >>>> >> >> >>>> Best Regards, Edward J. Yoon > >> > >>>> >> >> >>>> @eddieyoon > >> > >>>> >> >> >>> > >> > >>>> >> >> >>> > >> > >>>> >> >> > >> > >>>> >> >> > >> > >>>> >> >> > >> > >>>> >> >> -- > >> > >>>> >> >> Best Regards, Edward J. Yoon > >> > >>>> >> >> @eddieyoon > >> > >>>> >> >> > >> > >>>> >> > >> > >>>> >> > >> > >>>> >> > >> > >>>> >> -- > >> > >>>> >> Best Regards, Edward J. Yoon > >> > >>>> >> @eddieyoon > >> > >>>> >> > >> > >>>> > >> > >>>> > >> > >>>> > >> > >>>> -- > >> > >>>> Best Regards, Edward J. Yoon > >> > >>>> @eddieyoon > >> > >>>> > >> > >>> > >> > >>> > >> > >> > >> > > >> > > >> > > >> > -- > >> > Best Regards, Edward J. Yoon > >> > @eddieyoon > >> > > >> > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon >
