Re: [mySociety:public] Calculate optimum mid-point w ith TfL data — challenge from b3ta.com

Sam Smith Sun, 29 Aug 2010 07:36:35 -0700

But in the case requested, it was a way to meet for lunch. Where a constrained 
set of arrival times (ie every 15 minutes around lunchtime) is probably useful 
enough.


Especially, if following Matthew's traintimes.org.uk model, you show the 
surrounding options, and the user can visually see that a train 25 minutes 
later gets in 5 minutes later. Sometimes that 5 minutes matters, sometimes it 
doesn't. Or that they'll get a slightly later tube than they say they will 
because the other person will do the same :)



Sam

On 29 Aug 2010, at 14:49, Francis Davey wrote:

> On 28 August 2010 20:28, Paul Robinson <[email protected]> wrote:
>> 
>> It's a travelling salesman problem:
>> http://en.wikipedia.org/wiki/Travelling_salesman_problem
> 
> I'm afraid I disagree. The problems is deterministically polynomial.
> 
> A brute-force approach would go as follows:
> 
> (1) assume you have all graph representing the network of public
> transport you want to search with edges weighted by travel time
> 
> (2) find the all points shortest path using the Floyd-Warshall
> algorithm - which runs in cubic time (its n multiplications of an nxn
> matrix)
> 
> (3) take the rows (or columns) representing the start points of the
> two participants, find their difference (as vectors) this will give
> you the time advantage one of the participants has over the other in
> getting to any point
> 
> (4) now find the least value in that list (I forget - this is
> something like log(N), but not very bad) - that is (or those are if
> there are several) the optimum meeting point.
> 
> So it looks like not much worse than N^3 * LOG(N) in complexity.
> Certainly not like the TSP which is NP-complete.
> 
> Of course that doesn't mean its feasible. If you only want to know
> about tube stops, N isn't very large and you should be able to do all
> the above quite quickly, if its all public transport stops in Greater
> London, then you have a much larger N and I am not sure how long it
> would take. Francis Irving has done a lot of this and will know.
> 
> There are of course lots of clever heuristics you can do. The network
> is embedded (pretty nearly) in a 2D Euclidean space, so you can
> usually do much better than Floyd-Warshall, and you can almost
> certainly prune away points well off the shortest path between the
> participants. However I bet its feasible for the tube at least.
> 
> If the only access you have to tube data is to live tube running data
> from the tfl website (I don't know if the underlying data has been
> "freed" in any sense - I've not looked at this question for a year or
> so) then you might want to know what you can do just by scraping their
> site and not doing too many lookups. That's a harder question.
> 
>> Despite it being two people meeting at one mid-point, from a mathematical
>> point of view it's identical to saying "Given I am in Camden and my 2nd stop
>> must be Clerkenwell, what is the optimal first stop?", and therefore it can
>> be modelled as pure TSP.
> 
> Not so. TSP is NP in the number of stops. With only 1 intervening stop
> its complexity in that variable is irrelevant.
> 
>> As such, it's therefore - unfortunately - NP-hard. That's computer science
>> speak for "Are you kidding me?" :-)
>> In this particular case, it might be possible to work it out relatively
>> quickly based on:
>> - Time of day and scheduled services supposedly happening
>> - Total journey time
>> If you limited it to Tube/clear service/midday start/1pm finish, it starts
>> to get something you could do on paper in an afternoon, and you could brute
>> force it on a computer even quicker.
>> If you wanted to build a tool to take into account bus routes and stops,
>> service disruption "fallback meeting places", costs based on how Oyster
>> would charge it, etc. and then tried to make it more generic to allow any
>> two starting points and meeting place, it starts to scale rather horribly.
>> If somebody comes up with a generic solution applicable for any two points
>> and any meeting point, using any public transport route, and in such a way
>> that is computationally cheap, all done over some spare time over a weekend,
>> I'd say they are likely to be the kind of person considered for the Fields
>> medal or a Nobel prize at some point in their life. :-D
> 
> I wish 8-).
> 
> You can easily weight the solution I gave above (it just factors into
> the "separation" of two points) but then you can't rely on 2D
> embedding so easily for any heuristic you choose. Real-time "there and
> back" adds another layer, but shouldn't more than double how long it
> takes.
> 
> Depending on how ambitious you are being it should be feasible, but
> maybe not for everything on a mobile phone (if its doing the CPU
> intensive work) or on a server with millions of hits for the same
> reason.
> 
>> That said, public transport routes and stations do compose neat directional
>> graphs, and I have been looking at brute-forcing this (the only simple way,
>> albeit computationally pricey), with Manchester data. When I was in London
>> the other week, I couldn't help but look at the tube map and start to see it
>> as an undirected graph begging to be thrown at a few algorithms.
> 
> I am fairly sure that Francis has been doing/has done something
> exactly like this for mapumental.
> 
>> But then I costed doing it using EC2 compute units on the back of an
>> envelope and realised I needed to save up a bit more for that quantum
>> computer. :-)
>> You might say "but hang on, Traveline are able to do it with their Journey
>> Planner applications!", and you'd be sort of right, but that's because
>> they're using a naive graph algorithm that is based on the starting time of
>> the journey and the timetable. I know because I've tried lots of different
>> journeys in that tool and think I've worked out what they're doing.
>> The ATCO CIF data (and TransXChange) formats have the concept of hubs, so
>> you can delete out a crap load of nodes in the graph by aiming for hubs,
>> which simplifies it somewhat - it reduces the graph in Manchester from one
>> of thousands of nodes to one of dozens, essentially.
>> Such an algorithm will not tell you however, "if you were prepared to go 20
>> minutes later, you'd save 17 minutes total journey time and £2.80". It just
>> tries to find a short graph route based on timetable data from what I can
>> see, and it's far from finding an optimal route IME.
>> Is it obvious I've spent a fair bit of time thinking about this? :-)
> 
> Yes. The variable start time problem is rather more difficult because
> it really does add an extra dimension. Its why a lot of travel sites
> fail. They often start by asking me when I am travelling but for long
> distance holidays I really want to know when is best to travel. One
> ends up doing some searching by hand which is surely a mistake.
> 
> I am fairly sure that this has all been studied and solved
> theoretically though. Its just not my area.
> 
> -- 
> Francis Davey
> 
> _______________________________________________
> Mailing list [email protected]
> Archive, settings, or unsubscribe:
> https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public

-- 
A little inaccuracy can save tons of explanation.
        - Saki



_______________________________________________
Mailing list [email protected]
Archive, settings, or unsubscribe:
https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public

Re: [mySociety:public] Calculate optimum mid-point w ith TfL data — challenge from b3ta.com

Reply via email to