Sorry, I am increasing the scope here to outside graph module. When we have
spilling queue and sorted spilling queue, can we inject the partitioning
superstep as the first superstep and use local memory?
Today we have partitioning job within a job and are creating two copies of
data on HDFS. This could be really costly. Is it possible to create or
redistribute the partitions on local memory and initialize the record
reader there?
The user can run a separate job give in examples area to explicitly
repartition the data on HDFS. The deployment question is how much of disk
space gets allocated for local memory usage? Would it be a safe approach
with the limitations?

-Suraj

On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
<[email protected]>wrote:

> yes. Once Suraj added merging of sorted files we can add this to the
> partitioner pretty easily.
>
> 2013/2/28 Edward J. Yoon <[email protected]>
>
> > Eh,..... btw, is re-partitioned data really necessary to be Sorted?
> >
> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut
> > <[email protected]> wrote:
> > > Now I get how the partitioning works, obviously if you merge n sorted
> > files
> > > by just appending to each other, this will result in totally unsorted
> > data
> > > ;-)
> > > Why didn't you solve this via messaging?
> > >
> > > 2013/2/28 Thomas Jungblut <[email protected]>
> > >
> > >> Seems that they are not correctly sorted:
> > >>
> > >> vertexID: 50
> > >> vertexID: 52
> > >> vertexID: 54
> > >> vertexID: 56
> > >> vertexID: 58
> > >> vertexID: 61
> > >> ...
> > >> vertexID: 78
> > >> vertexID: 81
> > >> vertexID: 83
> > >> vertexID: 85
> > >> ...
> > >> vertexID: 94
> > >> vertexID: 96
> > >> vertexID: 98
> > >> vertexID: 1
> > >> vertexID: 10
> > >> vertexID: 12
> > >> vertexID: 14
> > >> vertexID: 16
> > >> vertexID: 18
> > >> vertexID: 21
> > >> vertexID: 23
> > >> vertexID: 25
> > >> vertexID: 27
> > >> vertexID: 29
> > >> vertexID: 3
> > >>
> > >> So this won't work then correctly...
> > >>
> > >>
> > >> 2013/2/28 Thomas Jungblut <[email protected]>
> > >>
> > >>> sure, have fun on your holidays.
> > >>>
> > >>>
> > >>> 2013/2/28 Edward J. Yoon <[email protected]>
> > >>>
> > >>>> Sure, but if you can fix quickly, please do. March 1 is holiday[1]
> so
> > >>>> I'll appear next week.
> > >>>>
> > >>>> 1. http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
> > >>>>
> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas Jungblut
> > >>>> <[email protected]> wrote:
> > >>>> > Maybe 50 is missing from the file, didn't observe if all items
> were
> > >>>> added.
> > >>>> > As far as I remember, I copy/pasted the logic of the ID into the
> > >>>> fastgen,
> > >>>> > want to have a look into it?
> > >>>> >
> > >>>> > 2013/2/28 Edward J. Yoon <[email protected]>
> > >>>> >
> > >>>> >> I guess, it's a bug of fastgen, when generate adjacency matrix
> into
> > >>>> >> multiple files.
> > >>>> >>
> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas Jungblut
> > >>>> >> <[email protected]> wrote:
> > >>>> >> > You have two files, are they partitioned correctly?
> > >>>> >> >
> > >>>> >> > 2013/2/28 Edward J. Yoon <[email protected]>
> > >>>> >> >
> > >>>> >> >> It looks like a bug.
> > >>>> >> >>
> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
> /tmp/randomgraph/
> > >>>> >> >> total 44
> > >>>> >> >> drwxrwxr-x  3 edward edward  4096  2월 28 18:03 .
> > >>>> >> >> drwxrwxrwt 19 root   root   20480  2월 28 18:04 ..
> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243  2월 28 18:01 part-00000
> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01 .part-00000.crc
> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251  2월 28 18:01 part-00001
> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01 .part-00001.crc
> > >>>> >> >> drwxrwxr-x  2 edward edward  4096  2월 28 18:03 partitions
> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
> > >>>> >> /tmp/randomgraph/partitions/
> > >>>> >> >> total 24
> > >>>> >> >> drwxrwxr-x 2 edward edward 4096  2월 28 18:03 .
> > >>>> >> >> drwxrwxr-x 3 edward edward 4096  2월 28 18:03 ..
> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932  2월 28 18:03 part-00000
> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03 .part-00000.crc
> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955  2월 28 18:03 part-00001
> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03 .part-00001.crc
> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
> > >>>> >> >>
> > >>>> >> >>
> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward <[email protected]>
> > wrote:
> > >>>> >> >> > yes i'll check again
> > >>>> >> >> >
> > >>>> >> >> > Sent from my iPhone
> > >>>> >> >> >
> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas Jungblut <
> > >>>> >> [email protected]>
> > >>>> >> >> wrote:
> > >>>> >> >> >
> > >>>> >> >> >> Can you verify an observation for me please?
> > >>>> >> >> >>
> > >>>> >> >> >> 2 files are created from fastgen, part-00000 and
> part-00001,
> > >>>> both
> > >>>> >> ~2.2kb
> > >>>> >> >> >> sized.
> > >>>> >> >> >> In the below partition directory, there is only a single
> > 5.56kb
> > >>>> file.
> > >>>> >> >> >>
> > >>>> >> >> >> Is it intended for the partitioner to write a single file
> if
> > you
> > >>>> >> >> configured
> > >>>> >> >> >> two?
> > >>>> >> >> >> It even reads it as a two files, strange huh?
> > >>>> >> >> >>
> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <[email protected]>
> > >>>> >> >> >>
> > >>>> >> >> >>> Will have a look into it.
> > >>>> >> >> >>>
> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph 1
> > >>>> >> >> >>> pagerank /tmp/randomgraph /tmp/pageout
> > >>>> >> >> >>>
> > >>>> >> >> >>> did work for me the last time I profiled, maybe the
> > >>>> partitioning
> > >>>> >> >> doesn't
> > >>>> >> >> >>> partition correctly with the input or something else.
> > >>>> >> >> >>>
> > >>>> >> >> >>>
> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <[email protected]>
> > >>>> >> >> >>>
> > >>>> >> >> >>> Fastgen input seems not work for graph examples.
> > >>>> >> >> >>>>
> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
> bin/hama
> > jar
> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar gen
> > fastgen
> > >>>> 100 10
> > >>>> >> >> >>>> /tmp/randomgraph 2
> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN util.NativeCodeLoader: Unable to
> > load
> > >>>> >> >> >>>> native-hadoop library for your platform... using
> > builtin-java
> > >>>> >> classes
> > >>>> >> >> >>>> where applicable
> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.BSPJobClient: Running job:
> > >>>> >> >> job_localrunner_0001
> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.LocalBSPRunner: Setting up a
> new
> > >>>> barrier
> > >>>> >> >> for 2
> > >>>> >> >> >>>> tasks!
> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: Current
> supersteps
> > >>>> >> number: 0
> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: The total number
> > of
> > >>>> >> >> supersteps: 0
> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient: Counters: 3
> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:     SUPERSTEPS=0
> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
> > LAUNCHED_TASKS=2
> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
> > >>>> >> TASK_OUTPUT_RECORDS=100
> > >>>> >> >> >>>> Job Finished in 3.212 seconds
> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
> bin/hama
> > jar
> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT
> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar
> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
> bin/hama
> > jar
> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar pagerank
> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN util.NativeCodeLoader: Unable to
> > load
> > >>>> >> >> >>>> native-hadoop library for your platform... using
> > builtin-java
> > >>>> >> classes
> > >>>> >> >> >>>> where applicable
> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat: Total input
> > paths
> > >>>> to
> > >>>> >> >> process
> > >>>> >> >> >>>> : 2
> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat: Total input
> > paths
> > >>>> to
> > >>>> >> >> process
> > >>>> >> >> >>>> : 2
> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.BSPJobClient: Running job:
> > >>>> >> >> job_localrunner_0001
> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.LocalBSPRunner: Setting up a
> new
> > >>>> barrier
> > >>>> >> >> for 2
> > >>>> >> >> >>>> tasks!
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Current
> supersteps
> > >>>> >> number: 1
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: The total number
> > of
> > >>>> >> >> supersteps: 1
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Counters: 6
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:     SUPERSTEPS=1
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> > LAUNCHED_TASKS=2
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> > SUPERSTEP_SUM=4
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> > >>>> IO_BYTES_READ=4332
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> > >>>> TIME_IN_SYNC_MS=14
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> > >>>> TASK_INPUT_RECORDS=100
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.FileInputFormat: Total input
> > paths
> > >>>> to
> > >>>> >> >> process
> > >>>> >> >> >>>> : 2
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient: Running job:
> > >>>> >> >> job_localrunner_0001
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.LocalBSPRunner: Setting up a
> new
> > >>>> barrier
> > >>>> >> >> for 2
> > >>>> >> >> >>>> tasks!
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50 vertices
> > are
> > >>>> loaded
> > >>>> >> >> into
> > >>>> >> >> >>>> local:1
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner: 50 vertices
> > are
> > >>>> loaded
> > >>>> >> >> into
> > >>>> >> >> >>>> local:0
> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR bsp.LocalBSPRunner: Exception
> > during
> > >>>> BSP
> > >>>> >> >> >>>> execution!
> > >>>> >> >> >>>> java.lang.IllegalArgumentException: Messages must never
> be
> > >>>> behind
> > >>>> >> the
> > >>>> >> >> >>>> vertex in ID! Current Message ID: 1 vs. 50
> > >>>> >> >> >>>>        at
> > >>>> >> >> >>>>
> > >>>> >>
> > org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
> > >>>> >> >> >>>>        at
> > >>>> >> >> >>>>
> > >>>> >> >>
> > >>>> >>
> > >>>>
> > org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
> > >>>> >> >> >>>>        at
> > >>>> >> >> >>>>
> > >>>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
> > >>>> >> >> >>>>        at
> > >>>> >> >> >>>>
> > >>>> >> >>
> > >>>> >>
> > >>>>
> > org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
> > >>>> >> >> >>>>        at
> > >>>> >> >> >>>>
> > >>>> >> >>
> > >>>> >>
> > >>>>
> >
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
> > >>>> >> >> >>>>        at
> > >>>> >> >> >>>>
> > >>>> >> >>
> > >>>> >>
> > >>>>
> >
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
> > >>>> >> >> >>>>        at
> > >>>> >> >> >>>>
> > >>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > >>>> >> >> >>>>        at
> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > >>>> >> >> >>>>        at
> > >>>> >> >> >>>>
> > >>>> >> >>
> > >>>>
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > >>>> >> >> >>>>        at
> > >>>> >> >> >>>>
> > >>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > >>>> >> >> >>>>        at
> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > >>>> >> >> >>>>        at
> > >>>> >> >> >>>>
> > >>>> >> >>
> > >>>> >>
> > >>>>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > >>>> >> >> >>>>        at
> > >>>> >> >> >>>>
> > >>>> >> >>
> > >>>> >>
> > >>>>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > >>>> >> >> >>>>        at java.lang.Thread.run(Thread.java:722)
> > >>>> >> >> >>>>
> > >>>> >> >> >>>>
> > >>>> >> >> >>>> --
> > >>>> >> >> >>>> Best Regards, Edward J. Yoon
> > >>>> >> >> >>>> @eddieyoon
> > >>>> >> >> >>>
> > >>>> >> >> >>>
> > >>>> >> >>
> > >>>> >> >>
> > >>>> >> >>
> > >>>> >> >> --
> > >>>> >> >> Best Regards, Edward J. Yoon
> > >>>> >> >> @eddieyoon
> > >>>> >> >>
> > >>>> >>
> > >>>> >>
> > >>>> >>
> > >>>> >> --
> > >>>> >> Best Regards, Edward J. Yoon
> > >>>> >> @eddieyoon
> > >>>> >>
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Best Regards, Edward J. Yoon
> > >>>> @eddieyoon
> > >>>>
> > >>>
> > >>>
> > >>
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
> >
>

Reply via email to