Most interestingly is that we took the stuff from Hadoop, so the bug must also be contained in Hadoop.
2012/11/15 Edward J. Yoon <[email protected]> > I think, we have to fix InputFormatters, BSPJobClient, and splitter in > FileInputFormat (+ unit tests). I'm not sure when can I do it. > > On Thu, Nov 15, 2012 at 10:58 PM, Tommaso Teofili > <[email protected]> wrote: > > I've tried and it works with a small no of tasks (< 19) but it fails if > > it's not set (so getting the default behavior). > > I'm not sure I understand the rationale of the fix without going deeper > > into the code, I'm just concerned if this is just a corner case or may > > affect some others which would be bad. > > I see that adding some more lines to my test file the error doesn't occur > > anymore ... > > > > If that is not a major issue but just a corner case then it's ok > otherwise > > I think it'd be better to fix before releasing. > > Regards, > > Tommaso > > > > > > > > > > > > > > 2012/11/15 Edward J. Yoon <[email protected]> > > > >> > Tommaso, your job works with different 'tasknum' correctly for same > >> input? > >> > >> Not working. (and I found HAMA-476) > >> > >> Let's release 0.6 first. I'll fix this problem ASAP, then release 0.6.1. > >> > >> What do you think? > >> > >> On Thu, Nov 15, 2012 at 7:10 PM, Edward J. Yoon <[email protected]> > >> wrote: > >> > I've changed only computeGoalSize(). > >> > > >> > protected long computeGoalSize(int numSplits, long totalSize) { > >> > - return totalSize / (numSplits == 0 ? 1 : numSplits); > >> > + // The minus 1 is for the remainder. > >> > + return totalSize / (numSplits <= 1 ? 1 : numSplits - 1); > >> > } > >> > > >> > I don't remember exactly what happens if a split is not on a record > >> boundary? > >> > > >> > Tommaso, your job works with different 'tasknum' correctly for same > >> input? > >> > > >> > On Thu, Nov 15, 2012 at 6:23 PM, Thomas Jungblut > >> > <[email protected]> wrote: > >> >> Edward changed something to the split behavious last night. Maybe it > >> broke > >> >> it. > >> >> > >> >> 2012/11/15 Tommaso Teofili <[email protected]> > >> >> > >> >>> Hi guys, > >> >>> > >> >>> I was just running a couple of tests with GradientDescentBSP when I > >> >>> realized that using the newly installed RC5 the algorithm fails at > its > >> very > >> >>> beginning because it seems it cannot read from input. > >> >>> > >> >>> java.io.IOException: cannot read input vector size > >> >>> at > >> >>> > >> >>> > >> > org.apache.hama.ml.regression.GradientDescentBSP.getXSize(GradientDescentBSP.java:268) > >> >>> at > >> >>> > >> >>> > >> > org.apache.hama.ml.regression.GradientDescentBSP.getInitialTheta(GradientDescentBSP.java:244) > >> >>> at > >> >>> > >> >>> > >> > org.apache.hama.ml.regression.GradientDescentBSP.bsp(GradientDescentBSP.java:72) > >> >>> at > >> >>> > >> > org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:254) > >> >>> at > >> >>> > >> > org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:284) > >> >>> at > >> >>> > >> > org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211) > >> >>> at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > >> >>> at java.util.concurrent.FutureTask.run(FutureTask.java:138) > >> >>> at > >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) > >> >>> at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > >> >>> at java.util.concurrent.FutureTask.run(FutureTask.java:138) > >> >>> at > >> >>> > >> >>> > >> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > >> >>> at > >> >>> > >> >>> > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > >> >>> at java.lang.Thread.run(Thread.java:680) > >> >>> > >> >>> > >> >>> Since I didn't change anything on that side and it works with > >> >>> 0.6.0-SNAPSHOT I wonder if the latest stuff related to input split > >> caused > >> >>> problems. > >> >>> > >> >>> WDYT? > >> >>> > >> >>> Tommaso > >> >>> > >> >>> p.s.: > >> >>> I noticed this just after my +1 on the RC vote but please keep it on > >> hold > >> >>> while we track this issue > >> >>> > >> > > >> > > >> > > >> > -- > >> > Best Regards, Edward J. Yoon > >> > @eddieyoon > >> > >> > >> > >> -- > >> Best Regards, Edward J. Yoon > >> @eddieyoon > >> > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon >
