> spilling queue and sorted spilling queue, can we inject the partitioning > superstep as the first superstep and use local memory?
Actually, I wanted to add something before calling BSP.setup() method to avoid execute additional BSP job. But, in my opinion, current is enough. I think, we need to collect more experiences of input partitioning on large environments. I'll do. BTW, I still don't know why it need to be Sorted?! MR-like? On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <[email protected]> wrote: > Sorry, I am increasing the scope here to outside graph module. When we have > spilling queue and sorted spilling queue, can we inject the partitioning > superstep as the first superstep and use local memory? > Today we have partitioning job within a job and are creating two copies of > data on HDFS. This could be really costly. Is it possible to create or > redistribute the partitions on local memory and initialize the record > reader there? > The user can run a separate job give in examples area to explicitly > repartition the data on HDFS. The deployment question is how much of disk > space gets allocated for local memory usage? Would it be a safe approach > with the limitations? > > -Suraj > > On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut > <[email protected]>wrote: > >> yes. Once Suraj added merging of sorted files we can add this to the >> partitioner pretty easily. >> >> 2013/2/28 Edward J. Yoon <[email protected]> >> >> > Eh,..... btw, is re-partitioned data really necessary to be Sorted? >> > >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut >> > <[email protected]> wrote: >> > > Now I get how the partitioning works, obviously if you merge n sorted >> > files >> > > by just appending to each other, this will result in totally unsorted >> > data >> > > ;-) >> > > Why didn't you solve this via messaging? >> > > >> > > 2013/2/28 Thomas Jungblut <[email protected]> >> > > >> > >> Seems that they are not correctly sorted: >> > >> >> > >> vertexID: 50 >> > >> vertexID: 52 >> > >> vertexID: 54 >> > >> vertexID: 56 >> > >> vertexID: 58 >> > >> vertexID: 61 >> > >> ... >> > >> vertexID: 78 >> > >> vertexID: 81 >> > >> vertexID: 83 >> > >> vertexID: 85 >> > >> ... >> > >> vertexID: 94 >> > >> vertexID: 96 >> > >> vertexID: 98 >> > >> vertexID: 1 >> > >> vertexID: 10 >> > >> vertexID: 12 >> > >> vertexID: 14 >> > >> vertexID: 16 >> > >> vertexID: 18 >> > >> vertexID: 21 >> > >> vertexID: 23 >> > >> vertexID: 25 >> > >> vertexID: 27 >> > >> vertexID: 29 >> > >> vertexID: 3 >> > >> >> > >> So this won't work then correctly... >> > >> >> > >> >> > >> 2013/2/28 Thomas Jungblut <[email protected]> >> > >> >> > >>> sure, have fun on your holidays. >> > >>> >> > >>> >> > >>> 2013/2/28 Edward J. Yoon <[email protected]> >> > >>> >> > >>>> Sure, but if you can fix quickly, please do. March 1 is holiday[1] >> so >> > >>>> I'll appear next week. >> > >>>> >> > >>>> 1. http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea >> > >>>> >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas Jungblut >> > >>>> <[email protected]> wrote: >> > >>>> > Maybe 50 is missing from the file, didn't observe if all items >> were >> > >>>> added. >> > >>>> > As far as I remember, I copy/pasted the logic of the ID into the >> > >>>> fastgen, >> > >>>> > want to have a look into it? >> > >>>> > >> > >>>> > 2013/2/28 Edward J. Yoon <[email protected]> >> > >>>> > >> > >>>> >> I guess, it's a bug of fastgen, when generate adjacency matrix >> into >> > >>>> >> multiple files. >> > >>>> >> >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas Jungblut >> > >>>> >> <[email protected]> wrote: >> > >>>> >> > You have two files, are they partitioned correctly? >> > >>>> >> > >> > >>>> >> > 2013/2/28 Edward J. Yoon <[email protected]> >> > >>>> >> > >> > >>>> >> >> It looks like a bug. >> > >>>> >> >> >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al >> /tmp/randomgraph/ >> > >>>> >> >> total 44 >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096 2월 28 18:03 . >> > >>>> >> >> drwxrwxrwt 19 root root 20480 2월 28 18:04 .. >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2243 2월 28 18:01 part-00000 >> > >>>> >> >> -rw-rw-r-- 1 edward edward 28 2월 28 18:01 .part-00000.crc >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2251 2월 28 18:01 part-00001 >> > >>>> >> >> -rw-rw-r-- 1 edward edward 28 2월 28 18:01 .part-00001.crc >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096 2월 28 18:03 partitions >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al >> > >>>> >> /tmp/randomgraph/partitions/ >> > >>>> >> >> total 24 >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096 2월 28 18:03 . >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096 2월 28 18:03 .. >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932 2월 28 18:03 part-00000 >> > >>>> >> >> -rw-rw-r-- 1 edward edward 32 2월 28 18:03 .part-00000.crc >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955 2월 28 18:03 part-00001 >> > >>>> >> >> -rw-rw-r-- 1 edward edward 32 2월 28 18:03 .part-00001.crc >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ >> > >>>> >> >> >> > >>>> >> >> >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward <[email protected]> >> > wrote: >> > >>>> >> >> > yes i'll check again >> > >>>> >> >> > >> > >>>> >> >> > Sent from my iPhone >> > >>>> >> >> > >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas Jungblut < >> > >>>> >> [email protected]> >> > >>>> >> >> wrote: >> > >>>> >> >> > >> > >>>> >> >> >> Can you verify an observation for me please? >> > >>>> >> >> >> >> > >>>> >> >> >> 2 files are created from fastgen, part-00000 and >> part-00001, >> > >>>> both >> > >>>> >> ~2.2kb >> > >>>> >> >> >> sized. >> > >>>> >> >> >> In the below partition directory, there is only a single >> > 5.56kb >> > >>>> file. >> > >>>> >> >> >> >> > >>>> >> >> >> Is it intended for the partitioner to write a single file >> if >> > you >> > >>>> >> >> configured >> > >>>> >> >> >> two? >> > >>>> >> >> >> It even reads it as a two files, strange huh? >> > >>>> >> >> >> >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <[email protected]> >> > >>>> >> >> >> >> > >>>> >> >> >>> Will have a look into it. >> > >>>> >> >> >>> >> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph 1 >> > >>>> >> >> >>> pagerank /tmp/randomgraph /tmp/pageout >> > >>>> >> >> >>> >> > >>>> >> >> >>> did work for me the last time I profiled, maybe the >> > >>>> partitioning >> > >>>> >> >> doesn't >> > >>>> >> >> >>> partition correctly with the input or something else. >> > >>>> >> >> >>> >> > >>>> >> >> >>> >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <[email protected]> >> > >>>> >> >> >>> >> > >>>> >> >> >>> Fastgen input seems not work for graph examples. >> > >>>> >> >> >>>> >> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$ >> bin/hama >> > jar >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen >> > fastgen >> > >>>> 100 10 >> > >>>> >> >> >>>> /tmp/randomgraph 2 >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN util.NativeCodeLoader: Unable to >> > load >> > >>>> >> >> >>>> native-hadoop library for your platform... using >> > builtin-java >> > >>>> >> classes >> > >>>> >> >> >>>> where applicable >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.BSPJobClient: Running job: >> > >>>> >> >> job_localrunner_0001 >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.LocalBSPRunner: Setting up a >> new >> > >>>> barrier >> > >>>> >> >> for 2 >> > >>>> >> >> >>>> tasks! >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: Current >> supersteps >> > >>>> >> number: 0 >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: The total number >> > of >> > >>>> >> >> supersteps: 0 >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: Counters: 3 >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: SUPERSTEPS=0 >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: >> > LAUNCHED_TASKS=2 >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: >> > >>>> >> TASK_OUTPUT_RECORDS=100 >> > >>>> >> >> >>>> Job Finished in 3.212 seconds >> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$ >> bin/hama >> > jar >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar >> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$ >> bin/hama >> > jar >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar pagerank >> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN util.NativeCodeLoader: Unable to >> > load >> > >>>> >> >> >>>> native-hadoop library for your platform... using >> > builtin-java >> > >>>> >> classes >> > >>>> >> >> >>>> where applicable >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat: Total input >> > paths >> > >>>> to >> > >>>> >> >> process >> > >>>> >> >> >>>> : 2 >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat: Total input >> > paths >> > >>>> to >> > >>>> >> >> process >> > >>>> >> >> >>>> : 2 >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.BSPJobClient: Running job: >> > >>>> >> >> job_localrunner_0001 >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.LocalBSPRunner: Setting up a >> new >> > >>>> barrier >> > >>>> >> >> for 2 >> > >>>> >> >> >>>> tasks! >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Current >> supersteps >> > >>>> >> number: 1 >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: The total number >> > of >> > >>>> >> >> supersteps: 1 >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Counters: 6 >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: SUPERSTEPS=1 >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: >> > LAUNCHED_TASKS=2 >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: >> > SUPERSTEP_SUM=4 >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: >> > >>>> IO_BYTES_READ=4332 >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: >> > >>>> TIME_IN_SYNC_MS=14 >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: >> > >>>> TASK_INPUT_RECORDS=100 >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.FileInputFormat: Total input >> > paths >> > >>>> to >> > >>>> >> >> process >> > >>>> >> >> >>>> : 2 >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Running job: >> > >>>> >> >> job_localrunner_0001 >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.LocalBSPRunner: Setting up a >> new >> > >>>> barrier >> > >>>> >> >> for 2 >> > >>>> >> >> >>>> tasks! >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50 vertices >> > are >> > >>>> loaded >> > >>>> >> >> into >> > >>>> >> >> >>>> local:1 >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50 vertices >> > are >> > >>>> loaded >> > >>>> >> >> into >> > >>>> >> >> >>>> local:0 >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR bsp.LocalBSPRunner: Exception >> > during >> > >>>> BSP >> > >>>> >> >> >>>> execution! >> > >>>> >> >> >>>> java.lang.IllegalArgumentException: Messages must never >> be >> > >>>> behind >> > >>>> >> the >> > >>>> >> >> >>>> vertex in ID! Current Message ID: 1 vs. 50 >> > >>>> >> >> >>>> at >> > >>>> >> >> >>>> >> > >>>> >> >> > org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279) >> > >>>> >> >> >>>> at >> > >>>> >> >> >>>> >> > >>>> >> >> >> > >>>> >> >> > >>>> >> > org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225) >> > >>>> >> >> >>>> at >> > >>>> >> >> >>>> >> > >>>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129) >> > >>>> >> >> >>>> at >> > >>>> >> >> >>>> >> > >>>> >> >> >> > >>>> >> >> > >>>> >> > org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256) >> > >>>> >> >> >>>> at >> > >>>> >> >> >>>> >> > >>>> >> >> >> > >>>> >> >> > >>>> >> > >> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286) >> > >>>> >> >> >>>> at >> > >>>> >> >> >>>> >> > >>>> >> >> >> > >>>> >> >> > >>>> >> > >> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211) >> > >>>> >> >> >>>> at >> > >>>> >> >> >>>> >> > >>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >> > >>>> >> >> >>>> at >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166) >> > >>>> >> >> >>>> at >> > >>>> >> >> >>>> >> > >>>> >> >> >> > >>>> >> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >> > >>>> >> >> >>>> at >> > >>>> >> >> >>>> >> > >>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >> > >>>> >> >> >>>> at >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166) >> > >>>> >> >> >>>> at >> > >>>> >> >> >>>> >> > >>>> >> >> >> > >>>> >> >> > >>>> >> > >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >> > >>>> >> >> >>>> at >> > >>>> >> >> >>>> >> > >>>> >> >> >> > >>>> >> >> > >>>> >> > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >> > >>>> >> >> >>>> at java.lang.Thread.run(Thread.java:722) >> > >>>> >> >> >>>> >> > >>>> >> >> >>>> >> > >>>> >> >> >>>> -- >> > >>>> >> >> >>>> Best Regards, Edward J. Yoon >> > >>>> >> >> >>>> @eddieyoon >> > >>>> >> >> >>> >> > >>>> >> >> >>> >> > >>>> >> >> >> > >>>> >> >> >> > >>>> >> >> >> > >>>> >> >> -- >> > >>>> >> >> Best Regards, Edward J. Yoon >> > >>>> >> >> @eddieyoon >> > >>>> >> >> >> > >>>> >> >> > >>>> >> >> > >>>> >> >> > >>>> >> -- >> > >>>> >> Best Regards, Edward J. Yoon >> > >>>> >> @eddieyoon >> > >>>> >> >> > >>>> >> > >>>> >> > >>>> >> > >>>> -- >> > >>>> Best Regards, Edward J. Yoon >> > >>>> @eddieyoon >> > >>>> >> > >>> >> > >>> >> > >> >> > >> > >> > >> > -- >> > Best Regards, Edward J. Yoon >> > @eddieyoon >> > >> -- Best Regards, Edward J. Yoon @eddieyoon
