Bringing it back to lighttpd + mogstored issues (was: mogstored dying: redux)
hi Greg, I would be interested to hear what your results are with the patch below. I had identical problems to what you described earlier when using lighttpd + mogstored, and I'm 99.99% sure I used the patch below (added manually to the 2.17 release). I didn't go to much effort to debug it, since the failures only seemed to occur under production traffic. Everything worked fine in our test MogileFS setup with lighttpd + mogstored. Are there other fixes for lighttpd + mogstored in the SVN trunk that are not in 2.17? I did see this in the changelog: RFC 2518 says we should use a trailing slash when calling MKCOL. Some servers (nginx) appears to require it. (Spotted by Timu Eren). At the moment I have a sentinel script that monitors mogstored for excessive memory usage on each storage node and marks 'down'/restarts mogstored/marks 'alive' if necessary. So far that's been an acceptable bandaid for us, but I would love to have the storage nodes run efficiently. cheers, - Jared -- Jared Klett Co-founder, blip.tv office: 917.546.6989 x2002 mobile: 646.526.8948 aol im: JaredAtWrok http://blog.blip.tv -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ask Bjørn Hansen Sent: Wednesday, May 21, 2008 8:20 AM To: Greg Connor Cc: mogilefs@lists.danga.com Subject: Re: mogstored dying: redux On May 21, 2008, at 3:17, Greg Connor wrote: Thanks Mark. The test script worked fine. The 403 errors were only occurring with lighttpd used in place of perlbal. This was a suggestion (Ask's) which seemed like a good thing to try, but lighttpd actually made things worse. With lighttpd, about 1 in 5 requests failed to store, or failed to close. Oh, I'm sorry. I realize now that the make lighttpd work patch was never committed, darn. Try the patch below. http://lists.danga.com/pipermail/mogilefs/2007-November/001401.html --- server/lib/MogileFS/Device.pm (revision 1177) +++ server/lib/MogileFS/Device.pm (working copy) @@ -371,7 +371,7 @@ my $ans = $sock; # if they don't support this method, remember that -if ($ans $ans =~ m!HTTP/1\.[01] (400|405|501)!) { +if ($ans $ans =~ m!HTTP/1\.[01] (400|501)!) { $self-{no_mkcol} = 1; # TODO: move this into method on device, which propogates to parent # and also receive from parent. so all query workers share this knowledge -- http://develooper.com/ - http://askask.com/
Re: mogstored dying: redux
On May 20, 2008, at 11:27 AM, Mark Smith wrote: Hi all, I very much appreciate the patient help and advice, but I'm still having trouble getting even small files stored in my mogile setup. Given the error message you've pasted (403?) this seems like a configuration/setup problem. Are you sure that your MogileFS setup is even working at all, even without touching mogtool? Well, it's easy to figure out if it is or not. Here, this little script: --- If the process fails, can you copy the output of it and paste on the mailing list here? There should be a lot of text for all of the work that the library is doing that will tell you what's going on. Or anyway, will tell us what's going on, I don't expect most of it to make sense unless you know the internals of MogileFS. :) Thanks Mark. The test script worked fine. The 403 errors were only occurring with lighttpd used in place of perlbal. This was a suggestion (Ask's) which seemed like a good thing to try, but lighttpd actually made things worse. With lighttpd, about 1 in 5 requests failed to store, or failed to close. I've now reverted back to the standard mogstored/perlbal config, and it's *mostly* working but I'm concerned about the frequency of mogstored just plain dying... I have to keep a keepalive script running to relaunch any mogstored procs that have mysteriously stopped running by checking my 16 storage nodes every 5 min. I'm also worried about intermittent problems when pushing large numbers of files (currently using mogtool). I'm not sure if this corresponds to mogstored dying, or trying to hit a dead node before the restart kicks in, or what. The errors given out by mogtool in these intermittent cases are one of these: MogileFS backend error message: unknown_key unknown_key System error message: MogileFS::NewHTTPFile: unable to write to any allocated storage node at /usr/lib64/perl5/5.8.5/x86_64-linux-thread- multi/IO/Handle.pm line 399 System error message: Close failed at /usr/bin/mogtool line 816, Sock_minime336:7001 line 215. I can live with transmit errors once in a while, and for now mogtool seems to be retrying and recovering. But if they crash the storage node, that's a showstopper. If it's not normal for mogstored to just die like that, I will spend some time trying to figure out why that is. If it *is* normal for mogstored to just die sometimes, I need to get rid of it quickly and get lighttpd over its intermittent 403 problems. I don't think I have time to do both so I need pick a direction that's more likely to succeed. My time to evaluate this solution for our application is running out quickly. Thanks again for the replies. I would be lost without the help from the list (which probably means the documentation is weak and puny, but c'est la vie).
Re: mogstored dying: redux
On May 21, 2008, at 3:17, Greg Connor wrote: Thanks Mark. The test script worked fine. The 403 errors were only occurring with lighttpd used in place of perlbal. This was a suggestion (Ask's) which seemed like a good thing to try, but lighttpd actually made things worse. With lighttpd, about 1 in 5 requests failed to store, or failed to close. Oh, I'm sorry. I realize now that the make lighttpd work patch was never committed, darn. Try the patch below. http://lists.danga.com/pipermail/mogilefs/2007-November/001401.html --- server/lib/MogileFS/Device.pm (revision 1177) +++ server/lib/MogileFS/Device.pm (working copy) @@ -371,7 +371,7 @@ my $ans = $sock; # if they don't support this method, remember that -if ($ans $ans =~ m!HTTP/1\.[01] (400|405|501)!) { +if ($ans $ans =~ m!HTTP/1\.[01] (400|501)!) { $self-{no_mkcol} = 1; # TODO: move this into method on device, which propogates to parent # and also receive from parent. so all query workers share this knowledge -- http://develooper.com/ - http://askask.com/
Re: mogstored dying: redux
Greg Connor wrote: MogileFS backend error message: unknown_key unknown_key System error message: Close failed at /usr/bin/mogtool line 816, Sock_minime336:7001 line 78. This was try #1 and it's been 1.06 seconds since we first tried. Retrying... I am also seeing a large number of these errors: System error message: MogileFS::Backend: tracker socket never became readable (minime336:7001) when sending command: [create_open domain=dbbackupsfid=0class=dbbackups-recentmulti_dest=1key=dwh-20080519-vol9,99 ] at /usr/lib/perl5/site_perl/5.8.5/MogileFS/Client.pm line 268 Close failed at /usr/bin/mogtool line 816 unable to write to any allocated storage node at /usr/lib64/perl5/5.8.5/x86_64-linux-thread-multi/IO/Handle.pm line 399 Connection reset by peer tracker socket never became readable socket closed on read at /usr/lib/perl5/site_perl/5.8.5/MogileFS/NewHTTPFile.pm line 335 couldn't connect to mogilefsd backend at /usr/lib/perl5/site_perl/5.8.5/MogileFS/Client.pm line 268 Greg, superficially looking at this it seems that all the errors are networking related with failing socket calls and connectivity issues. You may want to check for pocket loss on your network and for latency issues. It may even be something as simple as a bad switch/cable somewhere or somebody else intermittently pushing a lot of traffic through your local LAN when you're testing (which I assume is on a GBit network, right?). Anyway, something to look at. -- Arthur Bebak [EMAIL PROTECTED]
Re: mogstored dying: redux
Hi all, I very much appreciate the patient help and advice, but I'm still having trouble getting even small files stored in my mogile setup. Given the error message you've pasted (403?) this seems like a configuration/setup problem. Are you sure that your MogileFS setup is even working at all, even without touching mogtool? Well, it's easy to figure out if it is or not. Here, this little script: --- use MogileFS::Client; $MogileFS::DEBUG = 1; my $mogc = MogileFS::Client-new( domain = foo.com::my_namespace, hosts = ['10.0.0.2:1234'], ); my $fh = $mogc-new_file(some_key, some_class); print $fh test; unless ($fh-close) { die Error writing file: . $mogc-errcode . : . $mogc-errstr . \n; } sleep 5; my @urls = $mogc-get_paths($key); print path: $_\n foreach @urls; $mogc-delete(some_key); --- Take that, put it on a machine that has the MogileFS client libraries, and change the values it's using to connect to the server to point at your tracker. Then put in a valid class instead of some_class and give it a shot. Does it work? Do you get paths printed? (I haven't tested this script, so you might need to kick it a little if there are any syntax errors and the like. Just kinda tossed it together.) If the process fails, can you copy the output of it and paste on the mailing list here? There should be a lot of text for all of the work that the library is doing that will tell you what's going on. Or anyway, will tell us what's going on, I don't expect most of it to make sense unless you know the internals of MogileFS. :) Thanks! -- Mark Smith / xb95 [EMAIL PROTECTED]
Re: mogstored dying: redux
Hi, In my experience WebDAV storage setup (lighttpd, nginx) are much better at handling large chunks/files than mogstored. I use nginx in a production environment with files ranging from a couple of bytes to a gigabyte, no problem. In the pre-production tests I ran mogstored died reliably with OOM's when handling 100MB+ files. Use mogstored only to manage the usage stats on your storage nodes in that case. Gr, Andy On Mon, May 19, 2008 at 3:25 AM, Greg Connor [EMAIL PROTECTED] wrote: On May 18, 2008, at 5:59 PM, Ask Bjørn Hansen wrote: On May 18, 2008, at 17:54, Greg Connor wrote: Running. Out of memory! Out of memory! Yikes. 64MB chunks shouldn't be that bad. Are the storage nodes otherwise loaded (high IO wait or some such). Nope, the storage nodes are doing nothing other than mogstored at this time. Did you try using another HTTP server (lighttpd, nginx, apache, ...) for the file transfers to the storage nodes? I suspect most/many users use that so mogstored doesn't get used that much in high traffic environments ... No I have not tried this. Do you believe mogstored is pretty useless in a production environment? If that's true and widely known, it's too bad the documents don't reflect this... Is there a document or list posting that explains what parts of mogilefs should be tuned (or outright replaced) for a high-traffic application? Are there documents stashed somewhere that I'm missing? I looked at the new wiki (last updates about 5 and 10 months ago) and read everything available there, and I've read most of the man pages. I keep finding stuff that I'm totally not getting. I would welcome some advice or pointers on how to get apache set up to replace mogstored for file transfers...
Re: mogstored dying: redux
Andy Lo A Foe wrote: Hi, In my experience WebDAV storage setup (lighttpd, nginx) are much better at handling large chunks/files than mogstored. I use nginx in a production environment with files ranging from a couple of bytes to a gigabyte, no problem. In the pre-production tests I ran mogstored died reliably with OOM's when handling 100MB+ files. Use mogstored only to manage the usage stats on your storage nodes in that case. Hi Andy, thanks for the reply. Do you feel nginx is better than lighttpd for this? How about apache? Is it simply a matter of having the other httpd listen on another port, and entering that port number in a config file? Did you have to do anything special to configure httpd (for example, to automatically create directories that don't yet exist for PUT requests?) thanks again
Re: mogstored dying: redux
We've been using lighttpd, and it works OK. We have run into problems using the default mogile-generated config not being able to fully utilize the devices. I *think* we have that solved now though. We also saw possible stat caching issues around new dir creation. server.stat-cache-engine = disable server.network-backend = linux-sendfile server.event-handler = linux-sysepoll server.max-worker = 8 lighttpd-1.4.15 --Justin Greg Connor wrote: Andy Lo A Foe wrote: Hi, In my experience WebDAV storage setup (lighttpd, nginx) are much better at handling large chunks/files than mogstored. I use nginx in a production environment with files ranging from a couple of bytes to a gigabyte, no problem. In the pre-production tests I ran mogstored died reliably with OOM's when handling 100MB+ files. Use mogstored only to manage the usage stats on your storage nodes in that case. Hi Andy, thanks for the reply. Do you feel nginx is better than lighttpd for this? How about apache? Is it simply a matter of having the other httpd listen on another port, and entering that port number in a config file? Did you have to do anything special to configure httpd (for example, to automatically create directories that don't yet exist for PUT requests?) thanks again
Re: mogstored dying: redux
On May 19, 2008, at 8:49 AM, Greg Connor wrote: Is it simply a matter of having the other httpd listen on another port, and entering that port number in a config file? Did you have to do anything special to configure httpd (for example, to automatically create directories that don't yet exist for PUT requests?) Enable WebDAV should do that -- however mogilefs should be able to configure at least apache and lighttpd automatically. Be sure to use svn trunk as there were some fixes to some of that recently: http://code.sixapart.com/svn/mogilefs/trunk/server/CHANGES - ask -- http://develooper.com/ - http://askask.com/
mogstored dying: redux
I wrote a week or two ago and asked for help with my mogstored dying problem. Thanks to those who responded at that time. Since then, I have upgraded all my nodes (16 storage nodes with 2 also acting as trackers) to CentOS5.1 which runs perl 5.8.8. (The client machine has perl 5.8.5). I'm using the current subversion tree (1177) for trackers, storage nodes and clients/utils. Unfortunately I'm still having a problem with mogstored just dying, and I can't figure out why. Any help or pointers would be appreciated. I'm currently using mogtool to push a large amount of data: 5 bigfiles with a total size of 2454G. I'm expecting that to be broken up into 39269 chunks of 64M each, and right now I've got about 19000 chunks copied. My biggest problem right now is that mogstored just plain dies. It just stops with no message to either syslog or to its output. Of my 16 nodes, they have all stopped running mogstored between 4 and 10 times. In order to keep the copy going, I have to check for mogstored running every minute and restart it if not running. The only thing appearing in syslog is after it starts up again, it says perlbal[pid]: beginning run. The start script I have been using says --daemonize so I ran mogstored without --daemonize flag and got a bit more output: Running. Out of memory! Out of memory! Callback called exit. Callback called exit. END failed--call queue aborted. beginning run Running. There's a bit more information in mogtool's output but I don't know if these coincide with the mogstored crashes. Here are a few: WARNING: Unable to save file 'collect-20080516-vol6,280': Close failed at /usr/bin/mogtool line 816, Sock_minime336:7001 line 283. MogileFS backend error message: unknown_key unknown_key System error message: Close failed at /usr/bin/mogtool line 816, Sock_minime336:7001 line 283. WARNING: Unable to save file 'collect-20080516-vol6,311': MogileFS::NewHTTPFile: error reading from node for device 337007: Connection reset by peer at (eval 18) line 1 MogileFS backend error message: unknown_key unknown_key System error message: MogileFS::NewHTTPFile: error reading from node for device 337007: Connection reset by peer at (eval 18) line 1 WARNING: Unable to save file 'collect-20080516-vol6,1341': MogileFS::NewHTTPFile: error writing to node for device 343012: Connection reset by peer at /usr/lib64/perl5/5.8.5/x86_64-linux-thread- multi/IO/Handle.pm line 399 MogileFS backend error message: unknown_key unknown_key System error message: MogileFS::NewHTTPFile: error writing to node for device 343012: Connection reset by peer at /usr/lib64/perl5/5.8.5/ x86_64-linux-thread-multi/IO/Handle.pm line 399 WARNING: Unable to save file 'collect-20080516-vol6,1736': Close failed at /usr/bin/mogtool line 816, Sock_minime336:7001 line 1739. MogileFS backend error message: unknown_key unknown_key System error message: Close failed at /usr/bin/mogtool line 816, Sock_minime336:7001 line 1739. WARNING: Unable to save file 'collect-20080516-vol6,2373': MogileFS::NewHTTPFile: unable to write to any allocated storage node at /usr/lib64/perl5/5.8.5/x86_64-linux-thread-multi/IO/Handle.pm line 399 MogileFS backend error message: unknown_key unknown_key System error message: MogileFS::NewHTTPFile: unable to write to any allocated storage node at /usr/lib64/perl5/5.8.5/x86_64-linux-thread- multi/IO/Handle.pm line 399 A few times I observed mogstored not responding to the tracker (mogadm check just pauses when listing that host) and in that case, killing and restarting mogstored brings it back. I could probably check for this condition too, but now we're getting beyond a simple wrapper/ restart/sentinel script. Is the experience of mogstored just plain dying a common one, or is it pretty rare? If that were the only thing wrong I could get around it by wrapping mogstored with a shell script that relaunches it as soon as it quits, but I'd rather not have to do that... I'd rather get at the root of the problem and make it not die in the first place. A more important question I have is: Am I trying to do something with MogileFS that it's totally not designed for? Is anyone else out there known to be using mogile for really huge files, chunked like mogtool does, and if so, were people happy with the results? If it's really minor problems, I could probably fix them myself, but I'm concerned that the lack of documentation about mogile's internals would hamper self-support efforts. Thanks gregc
Re: mogstored dying: redux
On May 18, 2008, at 5:59 PM, Ask Bjørn Hansen wrote: On May 18, 2008, at 17:54, Greg Connor wrote: Running. Out of memory! Out of memory! Yikes. 64MB chunks shouldn't be that bad. Are the storage nodes otherwise loaded (high IO wait or some such). Nope, the storage nodes are doing nothing other than mogstored at this time. Did you try using another HTTP server (lighttpd, nginx, apache, ...) for the file transfers to the storage nodes? I suspect most/many users use that so mogstored doesn't get used that much in high traffic environments ... No I have not tried this. Do you believe mogstored is pretty useless in a production environment? If that's true and widely known, it's too bad the documents don't reflect this... Is there a document or list posting that explains what parts of mogilefs should be tuned (or outright replaced) for a high-traffic application? Are there documents stashed somewhere that I'm missing? I looked at the new wiki (last updates about 5 and 10 months ago) and read everything available there, and I've read most of the man pages. I keep finding stuff that I'm totally not getting. I would welcome some advice or pointers on how to get apache set up to replace mogstored for file transfers...