I've changed only computeGoalSize().
protected long computeGoalSize(int numSplits, long totalSize) {
- return totalSize / (numSplits == 0 ? 1 : numSplits);
+ // The minus 1 is for the remainder.
+ return totalSize / (numSplits <= 1 ? 1 : numSplits - 1);
}
I don't remember exactly what happens if a split is not on a record boundary?
Tommaso, your job works with different 'tasknum' correctly for same input?
On Thu, Nov 15, 2012 at 6:23 PM, Thomas Jungblut
<[email protected]> wrote:
> Edward changed something to the split behavious last night. Maybe it broke
> it.
>
> 2012/11/15 Tommaso Teofili <[email protected]>
>
>> Hi guys,
>>
>> I was just running a couple of tests with GradientDescentBSP when I
>> realized that using the newly installed RC5 the algorithm fails at its very
>> beginning because it seems it cannot read from input.
>>
>> java.io.IOException: cannot read input vector size
>> at
>>
>> org.apache.hama.ml.regression.GradientDescentBSP.getXSize(GradientDescentBSP.java:268)
>> at
>>
>> org.apache.hama.ml.regression.GradientDescentBSP.getInitialTheta(GradientDescentBSP.java:244)
>> at
>>
>> org.apache.hama.ml.regression.GradientDescentBSP.bsp(GradientDescentBSP.java:72)
>> at
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:254)
>> at
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:284)
>> at
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> at java.lang.Thread.run(Thread.java:680)
>>
>>
>> Since I didn't change anything on that side and it works with
>> 0.6.0-SNAPSHOT I wonder if the latest stuff related to input split caused
>> problems.
>>
>> WDYT?
>>
>> Tommaso
>>
>> p.s.:
>> I noticed this just after my +1 on the RC vote but please keep it on hold
>> while we track this issue
>>
--
Best Regards, Edward J. Yoon
@eddieyoon