Just FYI, one reason is that there're a lot of KeyValue stores. On Wed, Nov 2, 2011 at 11:46 PM, Thomas Jungblut <[email protected]> wrote: > Ah okay I see why. > But I don't see that this is very good. BTW the classes you've added from > Hadoop are missing the Apache header. > > Sorry for spamming. > > 2011/11/2 Thomas Jungblut <[email protected]> > >> And what is the reason to implement our own Input/output format if you >> stick with key/value pairs. >> Let's be compatible to Hadoop and use theirs. >> >> And we should really stop copying hadoop stuff arround. It is already >> there. >> >> >> 2011/11/2 Thomas Jungblut <[email protected]> >> >>> Great :) >>> >>> Do you have plans to integrate a partitioning? Currently this is just a >>> block assignment partitioning, hardcoded in the client. >>> This won't be useful for PageRank and SSSP. >>> This would help us in Graph package as well for the next release. >>> >>> 2011/11/2 Edward J. Yoon <[email protected]> >>> >>>> > For sure I agree we should allow the former programming model with no >>>> input> without explicitly instantiating dummy inputs/splits. What about >>>> providing> two basic (different) implementations? >>>> >>>> +1 >>>> >>>> I was about to. >>>> On Wed, Nov 2, 2011 at 9:23 PM, Tommaso Teofili >>>> <[email protected]> wrote: >>>> > 2011/11/2 Thomas Jungblut <[email protected]> >>>> > >>>> >> Another point while fixing the local runner: >>>> >> >>>> >> Are we now input driven? >>>> >> I see in the code that the user defined task number is overriden by >>>> the >>>> >> number of splits. >>>> >> Was this your intention? This will actually make realtime processing >>>> with >>>> >> no static input a real pain. >>>> >> For example if you want a similar behaviour in Hadoop M/R you'll need >>>> to >>>> >> create dummy splits, and this is not what we should aim at. >>>> >> >>>> >> We could simply check if the user define the NullInputFormat or >>>> nothing and >>>> >> then use the number of tasks the user has configured. >>>> >> >>>> > >>>> > For sure I agree we should allow the former programming model with no >>>> input >>>> > without explicitly instantiating dummy inputs/splits. What about >>>> providing >>>> > two basic (different) implementations? >>>> > Tommaso >>>> > >>>> > >>>> >> >>>> >> 2011/11/2 Tommaso Teofili <[email protected]> >>>> >> >>>> >> > 2011/11/2 Edward J. Yoon <[email protected]> >>>> >> > >>>> >> > > > I'm sure that not every job actually needs a cleanup or a setup. >>>> >> > > >>>> >> > > You're right. Almost BSP applications should override bsp() method >>>> >> > > but, setup() and cleaner() methods are not as you said. Let's fix >>>> >> > > them. >>>> >> > > >>>> >> > >>>> >> > Agreed +1 >>>> >> > >>>> >> > >>>> >> > > >>>> >> > > > Generally I would suggest to integrate the OutputCollector and >>>> the >>>> >> > > > RecordReader into the BSPPeerImpl. >>>> >> > > > So our peer is like the context in Hadoop. >>>> >> > > >>>> >> > > Good idea. >>>> >> > > >>>> >> > >>>> >> > +1 here too >>>> >> > >>>> >> > Tommaso >>>> >> > >>>> >> > >>>> >> > > >>>> >> > > On Wed, Nov 2, 2011 at 9:03 PM, Thomas Jungblut >>>> >> > > <[email protected]> wrote: >>>> >> > > > Yes. When I reworked that API, I made a default implementation >>>> in our >>>> >> > > > abstract BSP class. >>>> >> > > > So the user has to override the methods for himself, if he >>>> needs to. >>>> >> > > > I'm sure that not every job actually needs a cleanup or a setup. >>>> >> > > > >>>> >> > > > Generally I would suggest to integrate the OutputCollector and >>>> the >>>> >> > > > RecordReader into the BSPPeerImpl. >>>> >> > > > So our peer is like the context in Hadoop. >>>> >> > > > But that is just a minor thing. It is a great improvement ;) >>>> >> > > > >>>> >> > > > 2011/11/2 Edward J. Yoon <[email protected]> >>>> >> > > > >>>> >> > > >> There're bsp(), setup() and cleaner() methods. >>>> >> > > >> >>>> >> > > >> What is you suggestion? >>>> >> > > >> >>>> >> > > >> On Wed, Nov 2, 2011 at 8:47 PM, Thomas Jungblut >>>> >> > > >> <[email protected]> wrote: >>>> >> > > >> > Have a look at the combiner class. I know that this is just a >>>> >> > "test", >>>> >> > > but >>>> >> > > >> > it is really messy if the user does not use the methods, but >>>> is >>>> >> > > forced to >>>> >> > > >> > override them. >>>> >> > > >> > >>>> >> > > >> > 2011/11/2 Edward J. Yoon <[email protected]> >>>> >> > > >> > >>>> >> > > >> >> Why? >>>> >> > > >> >> >>>> >> > > >> >> On Wed, Nov 2, 2011 at 8:21 PM, Thomas Jungblut >>>> >> > > >> >> <[email protected]> wrote: >>>> >> > > >> >> > I totally dislike that BSP class now has abstract methods >>>> >> instead >>>> >> > > of >>>> >> > > >> >> > default implementations. >>>> >> > > >> >> > >>>> >> > > >> >> > 2011/11/2 Edward J. Yoon <[email protected]> >>>> >> > > >> >> > >>>> >> > > >> >> >> Hi all, >>>> >> > > >> >> >> >>>> >> > > >> >> >> As you know, recently combiners and IO are added. >>>> >> > > >> >> >> >>>> >> > > >> >> >> Please review them from user viewpoint. >>>> >> > > >> >> >> >>>> >> > > >> >> >> >>>> >> > > >> >> >> >>>> >> > > >> >> >>>> >> > > >> >>>> >> > > >>>> >> > >>>> >> >>>> http://svn.apache.org/repos/asf/incubator/hama/trunk/examples/src/main/java/org/apache/hama/examples/PiEstimator.java >>>> >> > > >> >> >> >>>> >> > > >> >> >> I'm testing multiple tasks and IO features on 100 nodes >>>> >> cluster >>>> >> > > using >>>> >> > > >> >> >> 10 tasks per node. If there's no issue, I'll close >>>> HAMA-258. >>>> >> > > >> >> >> >>>> >> > > >> >> >> Thanks. >>>> >> > > >> >> >> >>>> >> > > >> >> >> -- >>>> >> > > >> >> >> Best Regards, Edward J. Yoon >>>> >> > > >> >> >> @eddieyoon >>>> >> > > >> >> >> >>>> >> > > >> >> > >>>> >> > > >> >> > >>>> >> > > >> >> > >>>> >> > > >> >> > -- >>>> >> > > >> >> > Thomas Jungblut >>>> >> > > >> >> > Berlin <[email protected]> >>>> >> > > >> >> > >>>> >> > > >> >> >>>> >> > > >> >> >>>> >> > > >> >> >>>> >> > > >> >> -- >>>> >> > > >> >> Best Regards, Edward J. Yoon >>>> >> > > >> >> @eddieyoon >>>> >> > > >> >> >>>> >> > > >> > >>>> >> > > >> > >>>> >> > > >> > >>>> >> > > >> > -- >>>> >> > > >> > Thomas Jungblut >>>> >> > > >> > Berlin <[email protected]> >>>> >> > > >> > >>>> >> > > >> >>>> >> > > >> >>>> >> > > >> >>>> >> > > >> -- >>>> >> > > >> Best Regards, Edward J. Yoon >>>> >> > > >> @eddieyoon >>>> >> > > >> >>>> >> > > > >>>> >> > > > >>>> >> > > > >>>> >> > > > -- >>>> >> > > > Thomas Jungblut >>>> >> > > > Berlin <[email protected]> >>>> >> > > > >>>> >> > > >>>> >> > > >>>> >> > > >>>> >> > > -- >>>> >> > > Best Regards, Edward J. Yoon >>>> >> > > @eddieyoon >>>> >> > > >>>> >> > >>>> >> >>>> >> >>>> >> >>>> >> -- >>>> >> Thomas Jungblut >>>> >> Berlin <[email protected]> >>>> >> >>>> > >>>> >>>> >>>> >>>> -- >>>> Best Regards, Edward J. Yoon >>>> @eddieyoon >>>> >>> >>> >>> >>> -- >>> Thomas Jungblut >>> Berlin <[email protected]> >>> >> >> >> >> -- >> Thomas Jungblut >> Berlin <[email protected]> >> > > > > -- > Thomas Jungblut > Berlin <[email protected]> >
-- Best Regards, Edward J. Yoon @eddieyoon
