> I understand this point, but quite frankly, the *only* reason you try > to put that on the load balancer is because it's the only component > you have in front of your servers. What you're describing should > normally be processed either by passing a token to the static file > server (thus modifying an existing server for that purpose) or by > putting the static server before the application server, making it > proxy requests to the server and sometimes serve static contents > on valid responses. This would also require changes. But implementing > the token check in a flexible server such as nginx should not be too > hard and would also provide the ability to fetch static files from > many places, as well as using geolocation to use the server closest > to the user.
As a point to the OP; you could just put perlbal or nginx behind haproxy. This is effectively what we do. > I don't agree, I even think it's the opposite. With your method you can't > control because once the server has emitted its redirect, you don't > know if the user gets the entire file or not. With a token, either the > static server can talk to the application server to check for authorization, > or it can store local information about the downloads in progress and > their statuses. > > If you only want to work on a single site, you could even do that with > an LDAP or radius auth server. Your app then has to create dynamic > accounts and your static servers rely on them. I think the devil's advocate here is that the difference in the amount of work between issuing a blind reproxy in a scalable service, compared to having application servers stream the whole file is pretty big. It's also possible to reconstitute that information from logs. It doesn't seem like it's ever been worth the effort to sending large files from memory/CPU heavy application servers. > I have another question BTW. I know some large download sites running > either on NFS farms or with decentralized nodes (one node = storage+app). > I still fail to see the advantage of your solution compared to these > methods. If your concern is to have a large capacity online, then the > NFS servers can provide that. If the bandwidth is the concern, then > your solution is still limited to one site (one internet link, and > one or even a few LBs). And within a single server, whatever the > techno you're using, you don't even need to maintain a thread during > the transfer, you can pass the file descriptor to another process > which will take care of sending it using sendfile(). Something like MogileFS is an implementation of the "storage+app" NFS server thing. The tracker system automatically manages the distributed storage nodes. Adding new servers requires some minimal software and a few commands to add it to the cluster, and files are then put there and served from there. Want to retire an old server? just drain off the files and mark the devices/host as dead. MFS was originally written because good NFS systems are expensive, tend to have tight vendor lock-in, and vastly underperform something simpler. I've assembled mogilefs clusters from hardware on the floor which have outperformed $250k+ onstor or bluarc NFS filers. People can do it either way, and both work fine for many people.

