[
https://issues.apache.org/jira/browse/HAMA-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719478#comment-13719478
]
Ikhtiyor Ahmedov commented on HAMA-781:
---------------------------------------
file names in part of code you mentioned is same as input path, in my case
ratings1M.dat
Reason so far:
in case different cases of number of tasks, file names are same.
difference is in condition
{code:title=BSPJobClient.java_line_562}job.get("bsp.partitioning.runner.job")
== null{code}
numTask(2,3,5)==false && numTask(4)==true
why is "bsp.partitioning.runner.job" is null
this configuration is set only in one place BSPJobClient.java:partition()
API:line 440
and this configuration set prevented by condition in line 411
(numTasks != numSplits) == false and Constants.ENABLE_RUNTIME_PARTITIONING ==
false (this configuration set to true by default in 2 files, TestPartitioning
and GraphJob)
because of it, configuration setting is null and it affects
condition
{code:title=BSPJobClient.java_line_560}
if (split.getClass().getName().equals(FileSplit.class.getName())
&& job.getConfiguration().get(Constants.RUNTIME_PARTITIONING_CLASS) != null
&& job.get("bsp.partitioning.runner.job") == null) {
LOG.debug(((FileSplit) split).getPath().getName());
String[] extractPartitionID = ((FileSplit) split).getPath().getName()
.split("[-]");
rawSplit.setPartitionID(Integer.parseInt(extractPartitionID[1]));
}
{code}
where
{code:title=BSPJobClient.java_line_564}
// (FileSplit) split).getPath().getName() == ratings1M.dat
String[] extractPartitionID = ((FileSplit) split).getPath().getName()
.split("[-]"); // == [ratings1M.dat] , 1 elem
rawSplit.setPartitionID(Integer.parseInt(extractPartitionID[1])); // out of
bound
{code}
code block affected this situation.
{code:title=BSPJobClient.java_line_411}
if ((numTasks > 0 && numTasks != numSplits)
|| (job.getConfiguration().getBoolean(
Constants.ENABLE_RUNTIME_PARTITIONING, false) && job
.getConfiguration().get(Constants.RUNTIME_PARTITIONING_CLASS) !=
null)) {
if (numTasks == 0) {
numTasks = numSplits;
}
HamaConfiguration conf = new HamaConfiguration(job.getConfiguration());
conf.setInt(Constants.RUNTIME_DESIRED_PEERS_COUNT, numTasks);
if (job.getConfiguration().get(Constants.RUNTIME_PARTITIONING_DIR) !=
null) {
conf.set(Constants.RUNTIME_PARTITIONING_DIR, job.getConfiguration()
.get(Constants.RUNTIME_PARTITIONING_DIR));
}
if (job.getConfiguration().get(Constants.RUNTIME_PARTITIONING_CLASS) !=
null) {
conf.set(Constants.RUNTIME_PARTITIONING_CLASS,
job.get(Constants.RUNTIME_PARTITIONING_CLASS));
}
BSPJob partitioningJob = new BSPJob(conf);
LOG.debug("partitioningJob input: " +
partitioningJob.get(Constants.JOB_INPUT_DIR));
partitioningJob.setInputFormat(job.getInputFormat().getClass());
partitioningJob.setInputKeyClass(job.getInputKeyClass());
partitioningJob.setInputValueClass(job.getInputValueClass());
partitioningJob.setOutputFormat(NullOutputFormat.class);
partitioningJob.setOutputKeyClass(NullWritable.class);
partitioningJob.setOutputValueClass(NullWritable.class);
partitioningJob.setBspClass(PartitioningRunner.class);
partitioningJob.set("bsp.partitioning.runner.job", "true");
partitioningJob.getConfiguration().setBoolean(
Constants.ENABLE_RUNTIME_PARTITIONING, false);
partitioningJob.setOutputPath(partitionDir);
boolean isPartitioned = false;
try {
isPartitioned = partitioningJob.waitForCompletion(true);
} catch (InterruptedException e) {
LOG.error("Interrupted partitioning run-time.", e);
} catch (ClassNotFoundException e) {
LOG.error("Class not found error partitioning run-time.", e);
}
if (isPartitioned) {
if (job.getConfiguration().get(Constants.RUNTIME_PARTITIONING_DIR) !=
null) {
job.setInputPath(new Path(conf
.get(Constants.RUNTIME_PARTITIONING_DIR)));
} else {
job.setInputPath(partitionDir);
}
job.setBoolean("input.has.partitioned", true);
job.setInputFormat(NonSplitSequenceFileInputFormat.class);
} else {
LOG.error("Error partitioning the input path.");
throw new IOException("Runtime partition failed for the job.");
}
}
{code}
If we remove condition numTasks != numSplits seems everything is fine
> Setting partition split fails in local mode when file size is big and has a
> runtime partition (HashParitioner)
> --------------------------------------------------------------------------------------------------------------
>
> Key: HAMA-781
> URL: https://issues.apache.org/jira/browse/HAMA-781
> Project: Hama
> Issue Type: Bug
> Components: bsp core
> Reporter: Ikhtiyor Ahmedov
> Priority: Minor
> Attachments: HAMA-781.patch
>
>
> when input partitioner set to HashPartitioner and file size is big in local
> mode; in line 566 of BSPJobClient.java throws index out of bound exception.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira