Interesting to see the upper bound for Hadoop. However I guess this is a rare problem.
I'll try to implement what we discussed so far and train myself. Regards, Em Am 19.07.2011 21:40, schrieb Steve Lewis: > If the size of a record is too big to be processed by a node you > probably need to re-architect using a different > record which scales better and combines cleanly > You also need to ask at the start what data you need to retrieve and how > you intend to retrieve it- > at some point a database may start to look like a good solution although > in this case I might think about saying I can track the order of trips > to - say 16 and using a comma delimited list for the counts > > On Tue, Jul 19, 2011 at 11:14 AM, Em <mailformailingli...@yahoo.de > <mailto:mailformailingli...@yahoo.de>> wrote: > > Of course it won't scale or at least not as good as your suggested > model. Chances are good that my idea is not an option for a > production-system and not as usefull as the less-complex variant. So you > are right! > > The reason why I asked was to get an idea of what should be done, if a > record is too big to be processable by a node. > > Regards, > Em > > Am 19.07.2011 19:54, schrieb Steve Lewis: > > I assumed the problem was count the number of people visiting Moscow > > after London without considering iany intermediate stops. This > leads to > > a data structure which is easy to combine. The structure you propose > > adds more information and is difficult to combine. I doubt it could > > handle a billion people and recommend trying with a hundred people > > visiting 5 out of 20 destinations in random order to see how bad it is > > getting. > > > > My schema can handle billions of combinations assuming only that the > > total destinations in any node can be handled - i.e. a billion people > > can visit any of a thousand cities in random order and worst case > I need > > a thousand cities and a thousand counts - now I doubt that the schema > > you propose with added order information will scale to those levels > > > > On Tue, Jul 19, 2011 at 10:39 AM, Em <mailformailingli...@yahoo.de > <mailto:mailformailingli...@yahoo.de> > > <mailto:mailformailingli...@yahoo.de > <mailto:mailformailingli...@yahoo.de>>> wrote: > > > > Thanks! > > > > So you invert the data and than walk through each inverted result. > > Good point! > > What do you think about prefixing each city-name with the index in > > the list? > > > > This way you can say: > > London: 1_Moscow:2, 1_Paris:2, 2_Moscow:1, 2_Riga:4, 2_Paris:1, > > 3_Berlin:1... > > > > >From this list you can see that people are likely to visit > moscow right > > after london at their first or second journey. This would > maintain a > > strong order (whether that's good or bad depends on a > > real-world-scenario). > > > > Since your ideas gave me a good starting-point for realizing > this job > > (I'll practice it), we can make the problem more heavy-weight, if > > you like? > > > > What happens to records that are too big to be processable by > one node? > > Let's say from my above example of a strongly-ordered list one > gets a > > billion combinations - way too much for one node (we assume that). > > What possibilities does Hadoop offer to deal with such things? > > > > Regards and many thanks for the insights, > > Em > > > > > > Am 19.07.2011 19:15, schrieb Steve Lewis: > > > Assume Joe visits Washington, London, Paris and Moscow > > > > > > You start with records like > > > Joe:Washington:20-Jan-2011 > > > Joe:London:14-Feb2011 > > > Joe:Paris :9-Mar-2011 > > > > > > You want > > > Joe: Washington, London, Paris and Moscow > > > > > > For the next step the person is irrelevant > > > you want > > > > > > > > > Washington: London:1, Paris:1 ,Moscow:1 > > > London: , Paris:1 Moscow:1 > > > Paris: Moscow:1 > > > The first say after a visit to Washington there was one visit to > > London, > > > one to Paris and one to Moscow > > > > > > > > > This can be combined with the one from Joe > > > > > > > > > Now suppose Bill visits London and Moscow > > > So he generates > > > London: Moscow:1 > > > > > > This can be combined with the one from Joe saying London: , > > Paris:1 and > > > Moscow:1 > > > to give > > > > > > London: , Paris:1 and Moscow:2 > > > > > > Now suppose Sue visits London and Riga and Paris > > > So she generates > > > London: , Paris:1,Riga 1 > > > > > > This can be combined with London: , Paris:1 and Moscow:2 to > give > > > > > > London: , Paris:2 and Moscow:2,Riga 1 > > > > > > Note I can keep places in alphabetical order in the result > > > > > > > > > > > > On Tue, Jul 19, 2011 at 9:53 AM, Em > <mailformailingli...@yahoo.de <mailto:mailformailingli...@yahoo.de> > > <mailto:mailformailingli...@yahoo.de > <mailto:mailformailingli...@yahoo.de>> > > > <mailto:mailformailingli...@yahoo.de > <mailto:mailformailingli...@yahoo.de> > > <mailto:mailformailingli...@yahoo.de > <mailto:mailformailingli...@yahoo.de>>>> wrote: > > > > > > Hi Steven, > > > > > > thanks for your response! For the ease of use we can > make those > > > assumptions you made - maybe this makes it much easier to > > help. Those > > > little extras are something for after solving the "easy" > > version of the > > > task. :) > > > > > > What do you mean with the following? > > > > > > > The second job takes Person : list of places and > return for > > each place > > > > in the list consructs > > > > place : 1 | place after P : 1 | next place : 1 ... > > > > > > You mean something like that? > > > > > > Washington DC:1 > > > New York after Washington DC:1 > > > Miami after New York:1 > > > > > > I do not see the benefit for the result I like to get? > > > > > > The end-result should be something like that: > > > Washington DC => New York, Miami, Los Angeles > > > New York => Chicago, Seattle, San Francisco > > > > > > The point is, that one can see that persons that visited > > Washington DC > > > are likely to visit New York as the next place, Miami as the > > second and > > > L.A. as the third. > > > However, if I choose New York as my starting point, I > can see that > > > persons that start their journey in New York (and maybe > > weren't in DC > > > before) are likely to visit Chicago, Seattle and San > > Francisco. Maybe > > > Los Angeles comes at the 10th position. > > > > > > Regards, > > > Em > > > > > > > > > > > > > > > -- > > > Steven M. Lewis PhD > > > 4221 105th Ave NE > > > Kirkland, WA 98033 > > > 206-384-1340 <tel:206-384-1340> <tel:206-384-1340 > <tel:206-384-1340>> (cell) > > > Skype lordjoe_com > > > > > > > > > > > > > > > > -- > > Steven M. Lewis PhD > > 4221 105th Ave NE > > Kirkland, WA 98033 > > 206-384-1340 <tel:206-384-1340> (cell) > > Skype lordjoe_com > > > > > > > > > -- > Steven M. Lewis PhD > 4221 105th Ave NE > Kirkland, WA 98033 > 206-384-1340 (cell) > Skype lordjoe_com > >