Re: mogstored dying: redux
On May 20, 2008, at 11:27 AM, Mark Smith wrote: Hi all, I very much appreciate the patient help and advice, but I'm still having trouble getting even small files stored in my mogile setup. Given the error message you've pasted (403?) this seems like a configuration/setup problem. Are you sure that your MogileFS setup is even working at all, even without touching mogtool? Well, it's easy to figure out if it is or not. Here, this little script: --- If the process fails, can you copy the output of it and paste on the mailing list here? There should be a lot of text for all of the work that the library is doing that will tell you what's going on. Or anyway, will tell us what's going on, I don't expect most of it to make sense unless you know the internals of MogileFS. :) Thanks Mark. The test script worked fine. The 403 errors were only occurring with lighttpd used in place of perlbal. This was a suggestion (Ask's) which seemed like a good thing to try, but lighttpd actually made things worse. With lighttpd, about 1 in 5 requests failed to store, or failed to close. I've now reverted back to the standard mogstored/perlbal config, and it's *mostly* working but I'm concerned about the frequency of mogstored just plain dying... I have to keep a keepalive script running to relaunch any mogstored procs that have mysteriously stopped running by checking my 16 storage nodes every 5 min. I'm also worried about intermittent problems when pushing large numbers of files (currently using mogtool). I'm not sure if this corresponds to mogstored dying, or trying to hit a dead node before the restart kicks in, or what. The errors given out by mogtool in these intermittent cases are one of these: MogileFS backend error message: unknown_key unknown_key System error message: MogileFS::NewHTTPFile: unable to write to any allocated storage node at /usr/lib64/perl5/5.8.5/x86_64-linux-thread- multi/IO/Handle.pm line 399 System error message: Close failed at /usr/bin/mogtool line 816, Sock_minime336:7001 line 215. I can live with transmit errors once in a while, and for now mogtool seems to be retrying and recovering. But if they crash the storage node, that's a showstopper. If it's not normal for mogstored to just die like that, I will spend some time trying to figure out why that is. If it *is* normal for mogstored to just die sometimes, I need to get rid of it quickly and get lighttpd over its intermittent 403 problems. I don't think I have time to do both so I need pick a direction that's more likely to succeed. My time to evaluate this solution for our application is running out quickly. Thanks again for the replies. I would be lost without the help from the list (which probably means the documentation is weak and puny, but c'est la vie).
Re: mogstored dying: redux
On May 21, 2008, at 3:17, Greg Connor wrote: Thanks Mark. The test script worked fine. The 403 errors were only occurring with lighttpd used in place of perlbal. This was a suggestion (Ask's) which seemed like a good thing to try, but lighttpd actually made things worse. With lighttpd, about 1 in 5 requests failed to store, or failed to close. Oh, I'm sorry. I realize now that the make lighttpd work patch was never committed, darn. Try the patch below. http://lists.danga.com/pipermail/mogilefs/2007-November/001401.html --- server/lib/MogileFS/Device.pm (revision 1177) +++ server/lib/MogileFS/Device.pm (working copy) @@ -371,7 +371,7 @@ my $ans = $sock; # if they don't support this method, remember that -if ($ans $ans =~ m!HTTP/1\.[01] (400|405|501)!) { +if ($ans $ans =~ m!HTTP/1\.[01] (400|501)!) { $self-{no_mkcol} = 1; # TODO: move this into method on device, which propogates to parent # and also receive from parent. so all query workers share this knowledge -- http://develooper.com/ - http://askask.com/
Re: mogstored dying: redux
Greg Connor wrote: MogileFS backend error message: unknown_key unknown_key System error message: Close failed at /usr/bin/mogtool line 816, Sock_minime336:7001 line 78. This was try #1 and it's been 1.06 seconds since we first tried. Retrying... I am also seeing a large number of these errors: System error message: MogileFS::Backend: tracker socket never became readable (minime336:7001) when sending command: [create_open domain=dbbackupsfid=0class=dbbackups-recentmulti_dest=1key=dwh-20080519-vol9,99 ] at /usr/lib/perl5/site_perl/5.8.5/MogileFS/Client.pm line 268 Close failed at /usr/bin/mogtool line 816 unable to write to any allocated storage node at /usr/lib64/perl5/5.8.5/x86_64-linux-thread-multi/IO/Handle.pm line 399 Connection reset by peer tracker socket never became readable socket closed on read at /usr/lib/perl5/site_perl/5.8.5/MogileFS/NewHTTPFile.pm line 335 couldn't connect to mogilefsd backend at /usr/lib/perl5/site_perl/5.8.5/MogileFS/Client.pm line 268 Greg, superficially looking at this it seems that all the errors are networking related with failing socket calls and connectivity issues. You may want to check for pocket loss on your network and for latency issues. It may even be something as simple as a bad switch/cable somewhere or somebody else intermittently pushing a lot of traffic through your local LAN when you're testing (which I assume is on a GBit network, right?). Anyway, something to look at. -- Arthur Bebak [EMAIL PROTECTED]
Re: mogstored dying: redux
Hi all, I very much appreciate the patient help and advice, but I'm still having trouble getting even small files stored in my mogile setup. Given the error message you've pasted (403?) this seems like a configuration/setup problem. Are you sure that your MogileFS setup is even working at all, even without touching mogtool? Well, it's easy to figure out if it is or not. Here, this little script: --- use MogileFS::Client; $MogileFS::DEBUG = 1; my $mogc = MogileFS::Client-new( domain = foo.com::my_namespace, hosts = ['10.0.0.2:1234'], ); my $fh = $mogc-new_file(some_key, some_class); print $fh test; unless ($fh-close) { die Error writing file: . $mogc-errcode . : . $mogc-errstr . \n; } sleep 5; my @urls = $mogc-get_paths($key); print path: $_\n foreach @urls; $mogc-delete(some_key); --- Take that, put it on a machine that has the MogileFS client libraries, and change the values it's using to connect to the server to point at your tracker. Then put in a valid class instead of some_class and give it a shot. Does it work? Do you get paths printed? (I haven't tested this script, so you might need to kick it a little if there are any syntax errors and the like. Just kinda tossed it together.) If the process fails, can you copy the output of it and paste on the mailing list here? There should be a lot of text for all of the work that the library is doing that will tell you what's going on. Or anyway, will tell us what's going on, I don't expect most of it to make sense unless you know the internals of MogileFS. :) Thanks! -- Mark Smith / xb95 [EMAIL PROTECTED]
Re: mogstored dying: redux
Hi, In my experience WebDAV storage setup (lighttpd, nginx) are much better at handling large chunks/files than mogstored. I use nginx in a production environment with files ranging from a couple of bytes to a gigabyte, no problem. In the pre-production tests I ran mogstored died reliably with OOM's when handling 100MB+ files. Use mogstored only to manage the usage stats on your storage nodes in that case. Gr, Andy On Mon, May 19, 2008 at 3:25 AM, Greg Connor [EMAIL PROTECTED] wrote: On May 18, 2008, at 5:59 PM, Ask Bjørn Hansen wrote: On May 18, 2008, at 17:54, Greg Connor wrote: Running. Out of memory! Out of memory! Yikes. 64MB chunks shouldn't be that bad. Are the storage nodes otherwise loaded (high IO wait or some such). Nope, the storage nodes are doing nothing other than mogstored at this time. Did you try using another HTTP server (lighttpd, nginx, apache, ...) for the file transfers to the storage nodes? I suspect most/many users use that so mogstored doesn't get used that much in high traffic environments ... No I have not tried this. Do you believe mogstored is pretty useless in a production environment? If that's true and widely known, it's too bad the documents don't reflect this... Is there a document or list posting that explains what parts of mogilefs should be tuned (or outright replaced) for a high-traffic application? Are there documents stashed somewhere that I'm missing? I looked at the new wiki (last updates about 5 and 10 months ago) and read everything available there, and I've read most of the man pages. I keep finding stuff that I'm totally not getting. I would welcome some advice or pointers on how to get apache set up to replace mogstored for file transfers...
Re: mogstored dying: redux
Andy Lo A Foe wrote: Hi, In my experience WebDAV storage setup (lighttpd, nginx) are much better at handling large chunks/files than mogstored. I use nginx in a production environment with files ranging from a couple of bytes to a gigabyte, no problem. In the pre-production tests I ran mogstored died reliably with OOM's when handling 100MB+ files. Use mogstored only to manage the usage stats on your storage nodes in that case. Hi Andy, thanks for the reply. Do you feel nginx is better than lighttpd for this? How about apache? Is it simply a matter of having the other httpd listen on another port, and entering that port number in a config file? Did you have to do anything special to configure httpd (for example, to automatically create directories that don't yet exist for PUT requests?) thanks again
Re: mogstored dying: redux
We've been using lighttpd, and it works OK. We have run into problems using the default mogile-generated config not being able to fully utilize the devices. I *think* we have that solved now though. We also saw possible stat caching issues around new dir creation. server.stat-cache-engine = disable server.network-backend = linux-sendfile server.event-handler = linux-sysepoll server.max-worker = 8 lighttpd-1.4.15 --Justin Greg Connor wrote: Andy Lo A Foe wrote: Hi, In my experience WebDAV storage setup (lighttpd, nginx) are much better at handling large chunks/files than mogstored. I use nginx in a production environment with files ranging from a couple of bytes to a gigabyte, no problem. In the pre-production tests I ran mogstored died reliably with OOM's when handling 100MB+ files. Use mogstored only to manage the usage stats on your storage nodes in that case. Hi Andy, thanks for the reply. Do you feel nginx is better than lighttpd for this? How about apache? Is it simply a matter of having the other httpd listen on another port, and entering that port number in a config file? Did you have to do anything special to configure httpd (for example, to automatically create directories that don't yet exist for PUT requests?) thanks again
Re: mogstored dying: redux
On May 19, 2008, at 8:49 AM, Greg Connor wrote: Is it simply a matter of having the other httpd listen on another port, and entering that port number in a config file? Did you have to do anything special to configure httpd (for example, to automatically create directories that don't yet exist for PUT requests?) Enable WebDAV should do that -- however mogilefs should be able to configure at least apache and lighttpd automatically. Be sure to use svn trunk as there were some fixes to some of that recently: http://code.sixapart.com/svn/mogilefs/trunk/server/CHANGES - ask -- http://develooper.com/ - http://askask.com/
Re: mogstored dying: redux
On May 18, 2008, at 5:59 PM, Ask Bjørn Hansen wrote: On May 18, 2008, at 17:54, Greg Connor wrote: Running. Out of memory! Out of memory! Yikes. 64MB chunks shouldn't be that bad. Are the storage nodes otherwise loaded (high IO wait or some such). Nope, the storage nodes are doing nothing other than mogstored at this time. Did you try using another HTTP server (lighttpd, nginx, apache, ...) for the file transfers to the storage nodes? I suspect most/many users use that so mogstored doesn't get used that much in high traffic environments ... No I have not tried this. Do you believe mogstored is pretty useless in a production environment? If that's true and widely known, it's too bad the documents don't reflect this... Is there a document or list posting that explains what parts of mogilefs should be tuned (or outright replaced) for a high-traffic application? Are there documents stashed somewhere that I'm missing? I looked at the new wiki (last updates about 5 and 10 months ago) and read everything available there, and I've read most of the man pages. I keep finding stuff that I'm totally not getting. I would welcome some advice or pointers on how to get apache set up to replace mogstored for file transfers...