TextInputFormat has no problem so I was able to test Graph jobs with desired number of tasks.
I prefer the first solution. On Fri, Nov 16, 2012 at 3:00 PM, Thomas Jungblut <[email protected]> wrote: > The problem is, that the tasks were too many for the input to split. > There are two ways to solve this: > - make the splitter honor the record boundary, which basically means we > have to determine the number of records before splitting which is crazy > - remove the functionality for a "goal-size" and make the split totally > based on the filesystem's blocksize, which is safer because it takes care > of record boundaries. > > You choose, I would be +1 on the least - we would have to make the error be > more transparent to users when a job can't be scheduled then. > > 2012/11/16 Edward J. Yoon <[email protected]> > >> I didn't look at SequenceFile closely when I implement I/O system. So, >> don't know exactly. >> >> FYI, https://twitter.com/QwertyManiac/status/269093180220272640 >> >> On Thu, Nov 15, 2012 at 11:57 PM, Thomas Jungblut >> <[email protected]> wrote: >> > Most interestingly is that we took the stuff from Hadoop, so the bug must >> > also be contained in Hadoop. >> > >> > 2012/11/15 Edward J. Yoon <[email protected]> >> > >> >> I think, we have to fix InputFormatters, BSPJobClient, and splitter in >> >> FileInputFormat (+ unit tests). I'm not sure when can I do it. >> >> >> >> On Thu, Nov 15, 2012 at 10:58 PM, Tommaso Teofili >> >> <[email protected]> wrote: >> >> > I've tried and it works with a small no of tasks (< 19) but it fails >> if >> >> > it's not set (so getting the default behavior). >> >> > I'm not sure I understand the rationale of the fix without going >> deeper >> >> > into the code, I'm just concerned if this is just a corner case or may >> >> > affect some others which would be bad. >> >> > I see that adding some more lines to my test file the error doesn't >> occur >> >> > anymore ... >> >> > >> >> > If that is not a major issue but just a corner case then it's ok >> >> otherwise >> >> > I think it'd be better to fix before releasing. >> >> > Regards, >> >> > Tommaso >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > 2012/11/15 Edward J. Yoon <[email protected]> >> >> > >> >> >> > Tommaso, your job works with different 'tasknum' correctly for same >> >> >> input? >> >> >> >> >> >> Not working. (and I found HAMA-476) >> >> >> >> >> >> Let's release 0.6 first. I'll fix this problem ASAP, then release >> 0.6.1. >> >> >> >> >> >> What do you think? >> >> >> >> >> >> On Thu, Nov 15, 2012 at 7:10 PM, Edward J. Yoon < >> [email protected]> >> >> >> wrote: >> >> >> > I've changed only computeGoalSize(). >> >> >> > >> >> >> > protected long computeGoalSize(int numSplits, long totalSize) { >> >> >> > - return totalSize / (numSplits == 0 ? 1 : numSplits); >> >> >> > + // The minus 1 is for the remainder. >> >> >> > + return totalSize / (numSplits <= 1 ? 1 : numSplits - 1); >> >> >> > } >> >> >> > >> >> >> > I don't remember exactly what happens if a split is not on a record >> >> >> boundary? >> >> >> > >> >> >> > Tommaso, your job works with different 'tasknum' correctly for same >> >> >> input? >> >> >> > >> >> >> > On Thu, Nov 15, 2012 at 6:23 PM, Thomas Jungblut >> >> >> > <[email protected]> wrote: >> >> >> >> Edward changed something to the split behavious last night. Maybe >> it >> >> >> broke >> >> >> >> it. >> >> >> >> >> >> >> >> 2012/11/15 Tommaso Teofili <[email protected]> >> >> >> >> >> >> >> >>> Hi guys, >> >> >> >>> >> >> >> >>> I was just running a couple of tests with GradientDescentBSP >> when I >> >> >> >>> realized that using the newly installed RC5 the algorithm fails >> at >> >> its >> >> >> very >> >> >> >>> beginning because it seems it cannot read from input. >> >> >> >>> >> >> >> >>> java.io.IOException: cannot read input vector size >> >> >> >>> at >> >> >> >>> >> >> >> >>> >> >> >> >> >> >> org.apache.hama.ml.regression.GradientDescentBSP.getXSize(GradientDescentBSP.java:268) >> >> >> >>> at >> >> >> >>> >> >> >> >>> >> >> >> >> >> >> org.apache.hama.ml.regression.GradientDescentBSP.getInitialTheta(GradientDescentBSP.java:244) >> >> >> >>> at >> >> >> >>> >> >> >> >>> >> >> >> >> >> >> org.apache.hama.ml.regression.GradientDescentBSP.bsp(GradientDescentBSP.java:72) >> >> >> >>> at >> >> >> >>> >> >> >> >> >> >> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:254) >> >> >> >>> at >> >> >> >>> >> >> >> >> >> >> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:284) >> >> >> >>> at >> >> >> >>> >> >> >> >> >> >> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211) >> >> >> >>> at >> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> >> >> >>> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> >> >> >>> at >> >> >> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) >> >> >> >>> at >> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> >> >> >>> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> >> >> >>> at >> >> >> >>> >> >> >> >>> >> >> >> >> >> >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> >> >> >>> at >> >> >> >>> >> >> >> >>> >> >> >> >> >> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> >> >> >>> at java.lang.Thread.run(Thread.java:680) >> >> >> >>> >> >> >> >>> >> >> >> >>> Since I didn't change anything on that side and it works with >> >> >> >>> 0.6.0-SNAPSHOT I wonder if the latest stuff related to input >> split >> >> >> caused >> >> >> >>> problems. >> >> >> >>> >> >> >> >>> WDYT? >> >> >> >>> >> >> >> >>> Tommaso >> >> >> >>> >> >> >> >>> p.s.: >> >> >> >>> I noticed this just after my +1 on the RC vote but please keep >> it on >> >> >> hold >> >> >> >>> while we track this issue >> >> >> >>> >> >> >> > >> >> >> > >> >> >> > >> >> >> > -- >> >> >> > Best Regards, Edward J. Yoon >> >> >> > @eddieyoon >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Best Regards, Edward J. Yoon >> >> >> @eddieyoon >> >> >> >> >> >> >> >> >> >> >> -- >> >> Best Regards, Edward J. Yoon >> >> @eddieyoon >> >> >> >> >> >> -- >> Best Regards, Edward J. Yoon >> @eddieyoon >> -- Best Regards, Edward J. Yoon @eddieyoon
