Mark Smith wrote:
I want to use mogile to serve files to remote clients via http. Ideally
the files should be accessed directly from the storage nodes by the
clients. Mogile seems good for this since all file access is via http,
and it works nicely with robust http servers (like lighty).
Problem is that I don't want the remote clients to see the actual
MogileFS file path when accessing the files and I want some security not
letting any client access any file. So instead of providing the client
with something like "http://myserver.com/dev1/0/000/000/000000001.fid",
I want the client to access some name I generate (for example with
lighty's mod_secdownload).
Question is how would I go about configuring the storage nodes to do
this? Is there any such built in functionality in MogileFS?
That's not how MogileFS is traditionally used in a web environment.
The idea is that you run your MogileFS network internally and then
expose to the user something else that uses the MogileFS trackers to
translate your-names to internal-names. The storage nodes are then
never exposed to the end users.
This setup gives you the advantage of controlling paths caching in
your application (you know best when certain items should expire, if
ever) as well as the flexibility to properly handle fallback in the
case of unavailable storage nodes. It's also safer, you don't have to
plan your storage nodes to have separate upload and download
processes.
Anyway ... apparently we don't have "best practices" setup information
on the site or wiki... that's lame. Well, typically MogileFS is
combined with Perlbal as the latter does most of the heavy lifting for
using MogileFS. You still need your application servers to do a paths
lookup and decide on caching policies (if you want to enable path
caching), but you don't need to do any file serving there, Perlbal
will do it from the storage nodes.
I have a strong feeling that this is just going to be more confusing
than it was helpful, and I know our documentation is horrendous for
beginners, so please reply (to the list!) with questions and we'll get
you going. :)
The problem I'm trying to avoid by serving the files directly from the
storage nodes is overloading some dedicated machines with the work of
"proxying" data between the storage node and the end user. Also I don't
want the extra traffic on my local network. Path caching can still be
done by whoever provides the clients with the url's to the files. I can
easily avoid accessing the trackers too much by caching these paths, but
at the point when a client wants to d/l the file accessing the storage
node directly seems like the most optimized solution to me.
One option is installing a second http server on each storage node, it
will do whatever translation or security I want and then access the
local files upon request from the remote client. The remote client will
know what storage node to get to by talking with some web-app that uses
the trackers to find what storage nodes have the requested file, this
app can also do path caching if required.
Alternatively I can do everything from the already existing web server
on each storage node by hacking my way through MogileFS sources to add
some file name security options when it configures the storage node's
web server. If this seems logical and a better solution than two web
servers running on each storage node, then I might actually do it
(assuming some support from "the experts" will be available).
Finally I'd like to ask if there's any preference as to what web server
to run on the storage nodes (lighttpd/apache/perlbal), and what was the
original intention behind this flexibility?