[
https://issues.apache.org/jira/browse/HAMA-750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643221#comment-13643221
]
Edward J. Yoon commented on HAMA-750:
-------------------------------------
{code}
OK. MRQL works fine now with Hama 0.7.0 in distributed mode.
I haven't tested it on a real cluster yet.
I am attaching the output from pagerank.
By the way, Hama 0.7.0 runs 2 jobs for each BSPjob, although the first is fast.
Is this done to distribute the data among peers?
Leonidas
13/04/26 10:13:50 INFO mortbay.log: Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
*** Using 8 BSP tasks (out of a max 8). Each task will handle about 2525538
bytes of input data.
13/04/26 10:13:50 INFO bsp.FileInputFormat: Total input paths to process : 1
13/04/26 10:13:50 INFO bsp.FileInputFormat: Total input paths to process : 1
13/04/26 10:13:50 INFO bsp.BSPJobClient: Running job: job_201304260948_0020
13/04/26 10:13:53 INFO bsp.BSPJobClient: Current supersteps number: 0
13/04/26 10:14:02 INFO bsp.BSPJobClient: Current supersteps number: 2
13/04/26 10:14:05 INFO bsp.BSPJobClient: The total number of supersteps: 2
13/04/26 10:14:05 INFO bsp.BSPJobClient: Counters: 6
13/04/26 10:14:05 INFO bsp.BSPJobClient:
org.apache.hama.bsp.JobInProgress$JobCounter
13/04/26 10:14:05 INFO bsp.BSPJobClient: SUPERSTEPS=2
13/04/26 10:14:05 INFO bsp.BSPJobClient: LAUNCHED_TASKS=1
13/04/26 10:14:05 INFO bsp.BSPJobClient:
org.apache.hama.bsp.BSPPeerImpl$PeerCounter
13/04/26 10:14:05 INFO bsp.BSPJobClient: SUPERSTEP_SUM=2
13/04/26 10:14:05 INFO bsp.BSPJobClient: TIME_IN_SYNC_MS=178
13/04/26 10:14:05 INFO bsp.BSPJobClient: IO_BYTES_READ=20204222
13/04/26 10:14:05 INFO bsp.BSPJobClient: TASK_INPUT_RECORDS=918362
13/04/26 10:14:05 INFO bsp.FileInputFormat: Total input paths to process : 8
13/04/26 10:14:06 INFO bsp.BSPJobClient: Running job: job_201304260948_0019
13/04/26 10:14:09 INFO bsp.BSPJobClient: Current supersteps number: 0
13/04/26 10:14:18 INFO bsp.BSPJobClient: Current supersteps number: 2
13/04/26 10:14:30 INFO bsp.BSPJobClient: Current supersteps number: 3
13/04/26 10:14:33 INFO bsp.BSPJobClient: Current supersteps number: 4
13/04/26 10:14:36 INFO bsp.BSPJobClient: Current supersteps number: 5
13/04/26 10:14:42 INFO bsp.BSPJobClient: Current supersteps number: 6
13/04/26 10:14:45 INFO bsp.BSPJobClient: Current supersteps number: 8
13/04/26 10:14:54 INFO bsp.BSPJobClient: Current supersteps number: 11
13/04/26 10:15:03 INFO bsp.BSPJobClient: Current supersteps number: 14
13/04/26 10:15:12 INFO bsp.BSPJobClient: Current supersteps number: 18
13/04/26 10:15:15 INFO bsp.BSPJobClient: Current supersteps number: 19
13/04/26 10:15:15 INFO bsp.BSPJobClient: The total number of supersteps: 19
13/04/26 10:15:15 INFO bsp.BSPJobClient: Counters: 9
13/04/26 10:15:15 INFO bsp.BSPJobClient:
org.apache.hama.bsp.JobInProgress$JobCounter
13/04/26 10:15:15 INFO bsp.BSPJobClient: SUPERSTEPS=19
13/04/26 10:15:15 INFO bsp.BSPJobClient: LAUNCHED_TASKS=8
13/04/26 10:15:15 INFO bsp.BSPJobClient:
org.apache.hama.bsp.BSPPeerImpl$PeerCounter
13/04/26 10:15:15 INFO bsp.BSPJobClient: SUPERSTEP_SUM=152
13/04/26 10:15:15 INFO bsp.BSPJobClient: TIME_IN_SYNC_MS=132721
13/04/26 10:15:15 INFO bsp.BSPJobClient: IO_BYTES_READ=22986388
13/04/26 10:15:15 INFO bsp.BSPJobClient: TOTAL_MESSAGES_SENT=5694804
13/04/26 10:15:15 INFO bsp.BSPJobClient: TASK_INPUT_RECORDS=918362
13/04/26 10:15:15 INFO bsp.BSPJobClient: COMPRESSED_MESSAGES=8
13/04/26 10:15:15 INFO bsp.BSPJobClient: TOTAL_MESSAGES_RECEIVED=5694804
{code}
Works well now. I'll commit today.
> Determine the path of partition files
> -------------------------------------
>
> Key: HAMA-750
> URL: https://issues.apache.org/jira/browse/HAMA-750
> Project: Hama
> Issue Type: Bug
> Components: bsp core
> Reporter: Edward J. Yoon
> Assignee: Edward J. Yoon
> Fix For: 0.7.0
>
> Attachments: HAMA-750.patch, HAMA-750_v02.patch
>
>
> The parent directory of input file was used to determine the path of base
> directory for partition files. The problem is when input is multiple files.
> {code}
> protected BSPJob partition(BSPJob job, int maxTasks) throws IOException {
> String inputPath = job.getConfiguration().get(Constants.JOB_INPUT_DIR);
> Path inputDir = new Path(inputPath);
> if (fs.isFile(inputDir)) {
> inputDir = inputDir.getParent();
> }
> Path partitionDir = new Path(inputDir + "/partitions");
> if (fs.exists(partitionDir)) {
> fs.delete(partitionDir, true);
> }
> {code}
> Simply we can create partitions on temp directory. For example,
> /tmp/hama-partitions/{$JOB_NAME}/
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira