How many PIDs do we expect an ALTO server to handle?
I ask because I've discovered that code that works fine with a few hundred
pids dies when I go to 5,000 pids.
The biggest problem is the full cost map response message. With 5,000
pids, that has 25 million cost entries. When I went to 5,000 pids, my
off-the-shelf JSON parser ran out of memory. Granted I'm using java, so
there's a big memory hit there. But I gave the JVM a 4 gigabyte heap, and
it STILL ran out of memory.
Eventually I was able to parse a 5k-pid CostMap, but I had to completely
re-write the JSON parser.
So my first concern is that if we really expect servers to have 5,000
pids, and if we expect clients to ask for full cost maps, then we need to
warn client authors. Otherwise someone will write a simple client, test it
with a simple ALTO server ... and then have it die when it encounters a
production ALTO server.
My other concern is that the current JSON encoding of a full cost map is
very large and inefficient. For 5,000 pids, a full cost map takes 417
megs. And that's with relatively short pid names (pid_1 thru pid_5000).
Granted that can be compressed, but it's still a lot of data.
If we're concerned about maps with large pid counts, there is much more
efficient way to present the cost data. Instead of the nested dictionaries
with multiple repetitions of the pid names, we could use three arrays:
srcs
dests
costs
"srcs" & "dests" would be arrays of pid names. The cost array would have
nSrc x nDest costs. costs[0] would be the cost from srcs[0] to dests[0],
costs[1] would be the cost from srcs[0] to dests[1], etc. That is, "costs"
is the cost matrix in row-major order.
For example, where the current encoding uses
"map":{
"pid1": {"pid3":1, "pid4":2}
"pid2": {"pid3":3, "pid4":4}
}
the alternate encoding would be:
"srcs": ["pid1","pid2"],
"dests": ["pid3","pid4"],
"costs": [1,2,3,4]
That's not impressive for small pid counts, but it makes a big difference
for large counts. For example, for 5,000 pids:
current encoding: 417 megs
alternate encoding: 148 megs
What are your thoughts?
- Bill Roome
_______________________________________________
alto mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/alto