Bringing it back to lighttpd + mogstored issues (was: mogstored dying: redux)

2008-05-22 Thread Jared Klett
hi Greg,

I would be interested to hear what your results are with the patch 
below. I had identical problems to what you described earlier when using 
lighttpd + mogstored, and I'm 99.99% sure I used the patch below (added 
manually to the 2.17 release).

I didn't go to much effort to debug it, since the failures only seemed 
to occur under production traffic. Everything worked fine in our test MogileFS 
setup with lighttpd + mogstored.

Are there other fixes for lighttpd + mogstored in the SVN trunk that 
are not in 2.17? I did see this in the changelog:

RFC 2518 says we should use a trailing slash when calling MKCOL. Some 
servers (nginx) appears to require it. (Spotted by Timu Eren).

At the moment I have a sentinel script that monitors mogstored for 
excessive memory usage on each storage node and marks 'down'/restarts 
mogstored/marks 'alive' if necessary. So far that's been an acceptable bandaid 
for us, but I would love to have the storage nodes run efficiently.

cheers,

- Jared

-- 
Jared Klett
Co-founder, blip.tv
office: 917.546.6989 x2002
mobile: 646.526.8948
aol im: JaredAtWrok
http://blog.blip.tv

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ask Bjørn Hansen
Sent: Wednesday, May 21, 2008 8:20 AM
To: Greg Connor
Cc: mogilefs@lists.danga.com
Subject: Re: mogstored dying: redux


On May 21, 2008, at 3:17, Greg Connor wrote:

 Thanks Mark.  The test script worked fine.  The 403 errors were only 
 occurring with lighttpd used in place of perlbal.  This was a 
 suggestion (Ask's) which seemed like a good thing to try, but lighttpd 
 actually made things worse.  With lighttpd, about 1 in 5 requests 
 failed to store, or failed to close.

Oh, I'm sorry.  I realize now that the make lighttpd work patch was never 
committed, darn.  Try the patch below.

http://lists.danga.com/pipermail/mogilefs/2007-November/001401.html

--- server/lib/MogileFS/Device.pm   (revision 1177)
+++ server/lib/MogileFS/Device.pm   (working copy)
@@ -371,7 +371,7 @@
  my $ans = $sock;

  # if they don't support this method, remember that
-if ($ans  $ans =~ m!HTTP/1\.[01] (400|405|501)!) {
+if ($ans  $ans =~ m!HTTP/1\.[01] (400|501)!) {
  $self-{no_mkcol} = 1;
  # TODO: move this into method on device, which propogates to parent
  # and also receive from parent.  so all query workers share this 
knowledge


--
http://develooper.com/ - http://askask.com/




Re: mogstored dying: redux

2008-05-21 Thread Greg Connor


On May 20, 2008, at 11:27 AM, Mark Smith wrote:

Hi all, I very much appreciate the patient help and advice, but I'm  
still

having trouble getting even small files stored in my mogile setup.


Given the error message you've pasted (403?) this seems like a
configuration/setup problem.  Are you sure that your MogileFS setup is
even working at all, even without touching mogtool?  Well, it's easy
to figure out if it is or not.  Here, this little script:

---

If the process fails, can you copy the output of it and paste on the
mailing list here?  There should be a lot of text for all of the work
that the library is doing that will tell you what's going on.  Or
anyway, will tell us what's going on, I don't expect most of it to
make sense unless you know the internals of MogileFS.  :)



Thanks Mark.  The test script worked fine.  The 403 errors were only  
occurring with lighttpd used in place of perlbal.  This was a  
suggestion (Ask's) which seemed like a good thing to try, but lighttpd  
actually made things worse.  With lighttpd, about 1 in 5 requests  
failed to store, or failed to close.


I've now reverted back to the standard mogstored/perlbal config, and  
it's *mostly* working but I'm concerned about the frequency of  
mogstored just plain dying... I have to keep a keepalive script  
running to relaunch any mogstored procs that have mysteriously stopped  
running by checking my 16 storage nodes every 5 min.


I'm also worried about intermittent problems when pushing large  
numbers of files (currently using mogtool).  I'm not sure if this  
corresponds to mogstored dying, or trying to hit a dead node before  
the restart kicks in, or what.  The errors given out by mogtool in  
these intermittent cases are one of these:

 MogileFS backend error message: unknown_key unknown_key
 System error message: MogileFS::NewHTTPFile: unable to write to any  
allocated storage node at /usr/lib64/perl5/5.8.5/x86_64-linux-thread- 
multi/IO/Handle.pm line 399
 System error message: Close failed at /usr/bin/mogtool line 816,  
Sock_minime336:7001 line 215.



I can live with transmit errors once in a while, and for now mogtool  
seems to be retrying and recovering.  But if they crash the storage  
node, that's a showstopper.   If it's not normal for mogstored to just  
die like that, I will spend some time trying to figure out why that  
is.  If it *is* normal for mogstored to just die sometimes, I need to  
get rid of it quickly and get lighttpd over its intermittent 403  
problems.  I don't think I have time to do both so I need pick a  
direction that's more likely to succeed.  My time to evaluate this  
solution for our application is running out quickly.


Thanks again for the replies.  I would be lost without the help from  
the list (which probably means the documentation is weak and puny, but  
c'est la vie).


Re: mogstored dying: redux

2008-05-21 Thread Ask Bjørn Hansen


On May 21, 2008, at 3:17, Greg Connor wrote:

Thanks Mark.  The test script worked fine.  The 403 errors were only  
occurring with lighttpd used in place of perlbal.  This was a  
suggestion (Ask's) which seemed like a good thing to try, but  
lighttpd actually made things worse.  With lighttpd, about 1 in 5  
requests failed to store, or failed to close.


Oh, I'm sorry.  I realize now that the make lighttpd work patch was  
never committed, darn.  Try the patch below.


http://lists.danga.com/pipermail/mogilefs/2007-November/001401.html

--- server/lib/MogileFS/Device.pm   (revision 1177)
+++ server/lib/MogileFS/Device.pm   (working copy)
@@ -371,7 +371,7 @@
 my $ans = $sock;

 # if they don't support this method, remember that
-if ($ans  $ans =~ m!HTTP/1\.[01] (400|405|501)!) {
+if ($ans  $ans =~ m!HTTP/1\.[01] (400|501)!) {
 $self-{no_mkcol} = 1;
 # TODO: move this into method on device, which propogates to  
parent
 # and also receive from parent.  so all query workers share  
this knowledge



--
http://develooper.com/ - http://askask.com/




Re: mogstored dying: redux

2008-05-21 Thread Arthur Bebak

Greg Connor wrote:





MogileFS backend error message: unknown_key unknown_key
System error message: Close failed at /usr/bin/mogtool line 816,  

Sock_minime336:7001 line 78.
This was try #1 and it's been 1.06 seconds since we first tried.   

Retrying...



I am also seeing a large number of these errors:

System error message: MogileFS::Backend: tracker socket never became 
readable (minime336:7001) when sending command: [create_open 
domain=dbbackupsfid=0class=dbbackups-recentmulti_dest=1key=dwh-20080519-vol9,99 
] at /usr/lib/perl5/site_perl/5.8.5/MogileFS/Client.pm line 268


  Close failed at /usr/bin/mogtool line 816
  unable to write to any allocated storage node at 
/usr/lib64/perl5/5.8.5/x86_64-linux-thread-multi/IO/Handle.pm line 399

  Connection reset by peer
  tracker socket never became readable
  socket closed on read at 
/usr/lib/perl5/site_perl/5.8.5/MogileFS/NewHTTPFile.pm line 335
  couldn't connect to mogilefsd backend at 
/usr/lib/perl5/site_perl/5.8.5/MogileFS/Client.pm line 268


Greg, superficially looking at this it seems that all the errors are
networking related with failing socket calls and connectivity issues.

You may want to check for pocket loss on your network and for latency issues.
It may even be something as simple as a bad switch/cable
somewhere or somebody else intermittently pushing a lot of traffic through
your local LAN when you're testing (which I assume is
on a GBit network, right?).

Anyway, something to look at.



--
Arthur Bebak
[EMAIL PROTECTED]


Re: mogstored dying: redux

2008-05-20 Thread Mark Smith
 Hi all, I very much appreciate the patient help and advice, but I'm still
 having trouble getting even small files stored in my mogile setup.

Given the error message you've pasted (403?) this seems like a
configuration/setup problem.  Are you sure that your MogileFS setup is
even working at all, even without touching mogtool?  Well, it's easy
to figure out if it is or not.  Here, this little script:

---
use MogileFS::Client;

$MogileFS::DEBUG = 1;

my $mogc = MogileFS::Client-new(
domain = foo.com::my_namespace,
hosts  = ['10.0.0.2:1234'],
);

my $fh = $mogc-new_file(some_key, some_class);

print $fh test;

unless ($fh-close) {
die Error writing file:  . $mogc-errcode . :  . $mogc-errstr . \n;
}

sleep 5;
my @urls = $mogc-get_paths($key);
print path: $_\n foreach @urls;

$mogc-delete(some_key);
---

Take that, put it on a machine that has the MogileFS client libraries,
and change the values it's using to connect to the server to point at
your tracker.  Then put in a valid class instead of some_class and
give it a shot.  Does it work?  Do you get paths printed?  (I haven't
tested this script, so you might need to kick it a little if there are
any syntax errors and the like.  Just kinda tossed it together.)

If the process fails, can you copy the output of it and paste on the
mailing list here?  There should be a lot of text for all of the work
that the library is doing that will tell you what's going on.  Or
anyway, will tell us what's going on, I don't expect most of it to
make sense unless you know the internals of MogileFS.  :)

Thanks!


-- 
Mark Smith / xb95
[EMAIL PROTECTED]


Re: mogstored dying: redux

2008-05-19 Thread Andy Lo A Foe
Hi,

In my experience WebDAV storage setup (lighttpd, nginx) are much
better at handling large chunks/files than mogstored. I use nginx in a
production environment with files ranging from a couple of bytes to a
gigabyte, no problem. In the pre-production tests I ran mogstored died
reliably with OOM's when handling 100MB+ files. Use mogstored only to
manage the usage stats on your storage nodes in that case.

Gr,
Andy

On Mon, May 19, 2008 at 3:25 AM, Greg Connor [EMAIL PROTECTED] wrote:

 On May 18, 2008, at 5:59 PM, Ask Bjørn Hansen wrote:


 On May 18, 2008, at 17:54, Greg Connor wrote:

  Running.
  Out of memory!
  Out of memory!


 Yikes.   64MB chunks shouldn't be that bad.  Are the storage nodes
 otherwise loaded (high IO wait or some such).


 Nope, the storage nodes are doing nothing other than mogstored at this time.


 Did you try using another HTTP server (lighttpd, nginx, apache, ...) for
 the file transfers to the storage nodes?  I suspect most/many users use that
 so mogstored doesn't get used that much in high traffic environments ...

 No I have not tried this.  Do you believe mogstored is pretty useless in a
 production environment?  If that's true and widely known, it's too bad the
 documents don't reflect this... Is there a document or list posting that
 explains what parts of mogilefs should be tuned (or outright replaced) for a
 high-traffic application?

 Are there documents stashed somewhere that I'm missing?  I looked at the
 new wiki (last updates about 5 and 10 months ago) and read everything
 available there, and I've read most of the man pages.  I keep finding stuff
 that I'm totally not getting.  I would welcome some advice or pointers on
 how to get apache set up to replace mogstored for file transfers...


Re: mogstored dying: redux

2008-05-19 Thread Greg Connor

Andy Lo A Foe wrote:

Hi,

In my experience WebDAV storage setup (lighttpd, nginx) are much
better at handling large chunks/files than mogstored. I use nginx in a
production environment with files ranging from a couple of bytes to a
gigabyte, no problem. In the pre-production tests I ran mogstored died
reliably with OOM's when handling 100MB+ files. Use mogstored only to
manage the usage stats on your storage nodes in that case.



Hi Andy, thanks for the reply.

Do you feel nginx is better than lighttpd for this?  How about apache?

Is it simply a matter of having the other httpd listen on another port, 
and entering that port number in a config file?  Did you have to do 
anything special to configure httpd (for example, to automatically 
create directories that don't yet exist for PUT requests?)


thanks again


Re: mogstored dying: redux

2008-05-19 Thread Justin Huff
We've been using lighttpd, and it works OK. We have run into problems
using the default mogile-generated config not being able to fully
utilize the devices. I *think* we have that solved now though. We also
saw possible stat caching issues around new dir creation.

server.stat-cache-engine = disable
server.network-backend = linux-sendfile
server.event-handler = linux-sysepoll
server.max-worker = 8

lighttpd-1.4.15

--Justin

Greg Connor wrote:
 Andy Lo A Foe wrote:
 Hi,

 In my experience WebDAV storage setup (lighttpd, nginx) are much
 better at handling large chunks/files than mogstored. I use nginx in a
 production environment with files ranging from a couple of bytes to a
 gigabyte, no problem. In the pre-production tests I ran mogstored died
 reliably with OOM's when handling 100MB+ files. Use mogstored only to
 manage the usage stats on your storage nodes in that case.
 
 
 Hi Andy, thanks for the reply.
 
 Do you feel nginx is better than lighttpd for this?  How about apache?
 
 Is it simply a matter of having the other httpd listen on another port,
 and entering that port number in a config file?  Did you have to do
 anything special to configure httpd (for example, to automatically
 create directories that don't yet exist for PUT requests?)
 
 thanks again
 


Re: mogstored dying: redux

2008-05-19 Thread Ask Bjørn Hansen


On May 19, 2008, at 8:49 AM, Greg Connor wrote:

Is it simply a matter of having the other httpd listen on another  
port, and entering that port number in a config file?  Did you have  
to do anything special to configure httpd (for example, to  
automatically create directories that don't yet exist for PUT  
requests?)


Enable WebDAV should do that -- however mogilefs should be able to  
configure at least apache and lighttpd automatically.  Be sure to use  
svn trunk as there were some fixes to some of that recently:


http://code.sixapart.com/svn/mogilefs/trunk/server/CHANGES


 - ask

--
http://develooper.com/ - http://askask.com/




mogstored dying: redux

2008-05-18 Thread Greg Connor
I wrote a week or two ago and asked for help with my mogstored dying  
problem.  Thanks to those who responded at that time.  Since then, I  
have upgraded all my nodes (16 storage nodes with 2 also acting as  
trackers) to CentOS5.1 which runs perl 5.8.8. (The client machine has  
perl 5.8.5).  I'm using the current subversion tree (1177) for  
trackers, storage nodes and clients/utils.


Unfortunately I'm still having a problem with mogstored just dying,  
and I can't figure out why.  Any help or pointers would be appreciated.


I'm currently using mogtool to push a large amount of data: 5 bigfiles  
with a total size of 2454G.  I'm expecting that to be broken up into  
39269 chunks of 64M each, and right now I've got about 19000 chunks  
copied.


My biggest problem right now is that mogstored just plain dies.  It  
just stops with no message to either syslog or to its output.  Of my  
16 nodes, they have all stopped running mogstored between 4 and 10  
times.  In order to keep the copy going, I have to check for mogstored  
running every minute and restart it if not running.  The only thing  
appearing in syslog is after it starts up again, it says perlbal[pid]:  
beginning run.


The start script I have been using says --daemonize so I ran mogstored  
without --daemonize flag and got a bit more output:

Running.
Out of memory!
Out of memory!
Callback called exit.
Callback called exit.
END failed--call queue aborted.
beginning run
Running.



There's a bit more information in mogtool's output but I don't know if  
these coincide with the mogstored crashes.  Here are a few:


WARNING: Unable to save file 'collect-20080516-vol6,280': Close failed  
at /usr/bin/mogtool line 816, Sock_minime336:7001 line 283.

MogileFS backend error message: unknown_key unknown_key
System error message: Close failed at /usr/bin/mogtool line 816,  
Sock_minime336:7001 line 283.


WARNING: Unable to save file 'collect-20080516-vol6,311':  
MogileFS::NewHTTPFile: error reading from node for device 337007:  
Connection reset by peer at (eval 18) line 1

MogileFS backend error message: unknown_key unknown_key
System error message: MogileFS::NewHTTPFile: error reading from node  
for device 337007: Connection reset by peer at (eval 18) line 1


WARNING: Unable to save file 'collect-20080516-vol6,1341':  
MogileFS::NewHTTPFile: error writing to node for device 343012:  
Connection reset by peer at /usr/lib64/perl5/5.8.5/x86_64-linux-thread- 
multi/IO/Handle.pm line 399

MogileFS backend error message: unknown_key unknown_key
System error message: MogileFS::NewHTTPFile: error writing to node for  
device 343012: Connection reset by peer at /usr/lib64/perl5/5.8.5/ 
x86_64-linux-thread-multi/IO/Handle.pm line 399


WARNING: Unable to save file 'collect-20080516-vol6,1736': Close  
failed at /usr/bin/mogtool line 816, Sock_minime336:7001 line 1739.

MogileFS backend error message: unknown_key unknown_key
System error message: Close failed at /usr/bin/mogtool line 816,  
Sock_minime336:7001 line 1739.


WARNING: Unable to save file 'collect-20080516-vol6,2373':  
MogileFS::NewHTTPFile: unable to write to any allocated storage node  
at /usr/lib64/perl5/5.8.5/x86_64-linux-thread-multi/IO/Handle.pm line  
399

MogileFS backend error message: unknown_key unknown_key
System error message: MogileFS::NewHTTPFile: unable to write to any  
allocated storage node at /usr/lib64/perl5/5.8.5/x86_64-linux-thread- 
multi/IO/Handle.pm line 399


A few times I observed mogstored not responding to the tracker (mogadm  
check just pauses when listing that host) and in that case, killing  
and restarting mogstored brings it back.  I could probably check for  
this condition too, but now we're getting beyond a simple wrapper/ 
restart/sentinel script.




Is the experience of mogstored just plain dying a common one, or is it  
pretty rare?  If that were the only thing wrong I could get around it  
by wrapping mogstored with a shell script that relaunches it as soon  
as it quits, but I'd rather not have to do that... I'd rather get at  
the root of the problem and make it not die in the first place.



A more important question I have is:  Am I trying to do something with  
MogileFS that it's totally not designed for?  Is anyone else out there  
known to be using mogile for really huge files, chunked like mogtool  
does, and if so, were people happy with the results?   If it's really  
minor problems, I could probably fix them myself, but I'm concerned  
that the lack of documentation about mogile's internals would hamper  
self-support efforts.



Thanks
gregc



Re: mogstored dying: redux

2008-05-18 Thread Greg Connor


On May 18, 2008, at 5:59 PM, Ask Bjørn Hansen wrote:



On May 18, 2008, at 17:54, Greg Connor wrote:


  Running.
  Out of memory!
  Out of memory!



Yikes.   64MB chunks shouldn't be that bad.  Are the storage nodes  
otherwise loaded (high IO wait or some such).



Nope, the storage nodes are doing nothing other than mogstored at this  
time.



Did you try using another HTTP server (lighttpd, nginx, apache, ...)  
for the file transfers to the storage nodes?  I suspect most/many  
users use that so mogstored doesn't get used that much in high  
traffic environments ...


No I have not tried this.  Do you believe mogstored is pretty useless  
in a production environment?  If that's true and widely known, it's  
too bad the documents don't reflect this... Is there a document or  
list posting that explains what parts of mogilefs should be tuned (or  
outright replaced) for a high-traffic application?


Are there documents stashed somewhere that I'm missing?  I looked at  
the new wiki (last updates about 5 and 10 months ago) and read  
everything available there, and I've read most of the man pages.  I  
keep finding stuff that I'm totally not getting.  I would welcome some  
advice or pointers on how to get apache set up to replace mogstored  
for file transfers...