Re: [AOLSERVER] large file uploads

Jim Davidson Tue, 01 Dec 2009 12:28:01 -0800

Howdy,

Looking back at the code and trying to remember what I was thinking at the 
time, I ran across the header comment to "NsConnContent" which mentions the 
possibility of a mapping failure (see below).  This reminded me of what was 
going on...

Originally there was no "spool to file" option -- everything just read into a 
big heap-based buffers which was goofy, needing potentially lots of fragmented 
heap and/or subject to denial of service attacks (i.e., lots of requests to 
upload big things overloading memory).

The "spool to file" thing came later (can't remember when), using a nifty open 
temp file thing which avoids the comparably high-cost of actually creating a 
file on each connection, instead simply re-using an open fd and truncating it 
after each request.   There are a few side effects with this:

1. The open file itself is not visible on disk -- it's deleted right after open 
on Unix (there's some weird "secure" option on Windows which can't delete an 
open file).  This is a security benefit but it means you can't simply "stat" 
the underlying file from another thread/connection to monitor progress.  And, 
you can't simply rename the file to some permanent location -- you have to 
actually copy the content to a new file if that's your intention (in practice, 
you save a portion of the upload content in the multipart stuff anyway).

2.  The original code assuming everything was in memory and a string so, like 
most AOLserver improvements, that presented a backwards compatibility problem.  
The compromise was that if you called the original code (e.g., with "ns_conn 
form"), the new code mapped the file to make it look like it was in memory.  
This was an improvement as after the connection the mapping could go away, 
reclaiming the memory and avoiding fragmentation, but in practice you could 
still get enough simultaneous big uploads to run out of virtual memory.  You 
could then switch to 64bit or avoid those calls (in your code and possibly 
other filter code) using the clever incremental Tcl read/parse code in 
ns_parsefromfile mentioned below.

3.  The file upload is pre-connection so it's being spooled/copied in the 
single "driver" thread (driver is vestigial name which no longer makes sense).  
This is efficient enough but means something needs to be done to communicate 
progress to another thread if you want to get a background status updates.  A 
quick look at SockReadContent in driver.c shows you may just want the 
difference between connPtr->contentLength (to upload) and connPtr->avail 
(already copied).  But, those things aren't updated under a mutex lock -- you 
could do a dirty read (probably safe enough) in another thread but if you're 
digging around in that code you may as well add a structure to maintain 
progress specifically and a quick API to fetch it.  Lock contention/overhead 
would be insignificant.  You'd have one more problem of identifying the 
connection of interest and pondering security issues, if any, with that 
approach.   Plus you could have the problem of a rotor of machines where the 
upload is on!
  one machine and the check is on another so alternatively you could add some 
"progress callback" API instead that could be arbitrarily clever, e.g., sending 
it's update progress at certain points to some other shared store (hmm... that 
would be pretty cool feature).

-Jim

/*
 *----------------------------------------------------------------------
 *
 * NsConnContent --
 *
 *      Return the connction content buffer, mapping the temp file
 *      if necessary.
 *
 *      NB: It may make better sense to update the various content
 *      reading/parsing code to handle true incremental reads from
 *      an open file instead of risking a potential mapping failure.
 *      The current approach keeps the code simple and flexible.
 *
 * Results:
 *      Pointer to start of content or NULL if mapping failed.
 *
 * Side effects:
 *      If nextPtr and/or availPtr are not NULL, they are updated
 *      with the next byte to read and remaining content available.
 *
 *----------------------------------------------------------------------
 */

On Nov 24, 2009, at 8:06 PM, Tom Jackson wrote:

> John,
> 
> I think  your analysis is correct. You have to avoid certain API or
> you do extra work, and maybe take up extra space. Usually this is
> unimportant, but in your case it isn't.
> 
> Anyway, sounds like you are making progress. Please let me know if I
> have led you down the wrong path. This stuff isn't really documented
> anywhere.
> 
> tom jackson
> 
> On Tue, Nov 24, 2009 at 2:44 PM, John Buckman <[email protected]> wrote:
>> Tom, I figured out the problem with the "memory bloat" when I used your 
>> alternative ns_getform to parse a large file upload temp file.
>> 
>> The problem is that if I call ns_queryget after calling your ns_getform, 
>> that causes aolserver to re-parse the large file upload, and to do it in the 
>> old memory-inefficient way.
>> 
>> I'm not sure if there's a way for your version of ns_getform to tell the 
>> aolserver internals that the parsing of the form is done, so subsequent 
>> ns_queryget calls don't cause a re-parse.
>> 
>> At any rate, there's an easy workaround, which is to read things right out 
>> the ns_set that your ns_getform populates.
>> 
>> -john
>> 
>> 
>> On Nov 23, 2009, at 8:47 PM, John Buckman wrote:
>> 
>>> Tom, there is some sort of weird interaction effect on aolserver when doing 
>>> tcl stuff and there is a large file as the temporary file.  Some cases 
>>> cause the nsd process to bloat to the size of the memory.  I wasn't able to 
>>> figure out why. I wasn't able to fix your ns_getform to not bloat.
>>> 
>>> However, I was able to copy Vlad's ns_parseformfile proc for use on 
>>> aolserver from naviserver, and it doesn't bloat.
>>> 
>>> Here is the code for handling a large file uploaded as a file, rather than 
>>> in memory:
>>> 
>>>       set type [ns_set iget [ns_conn headers] content-type]
>>>       set form [ns_set create]
>>>       ns_parseformfile $form $type
>>>       array set formdata [ns_set array $form]
>>>       puts "final array: [array get formdata]"
>>> 
>>> 
>>> proc ns_parseformfile { form contentType } {
>>> 
>>>       set fp [ns_conn contentchannel]
>>> 
>>>    if { ![regexp -nocase {boundary=(.*)$} $contentType 1 b] } {
>>>               puts "Warning: no MIME boundary"
>>>        return
>>>    }
>>> 
>>>    fconfigure $fp -encoding binary -translation binary
>>>    set boundary "--$b"
>>> 
>>>    while { ![eof $fp] } {
>>>       # skip past the next boundary line
>>>       if { ![string match $boundary* [string trim [gets $fp]]] } {
>>>           continue
>>>       }
>>> 
>>>       # fetch the disposition line and field name
>>>       set disposition [string trim [gets $fp]]
>>>       if { ![string length $disposition] } {
>>>           break
>>>       }
>>> 
>>>       set disposition [split $disposition \;]
>>>       set name [string trim [lindex [split [lindex $disposition 1] =] 1] \"]
>>> 
>>>       # fetch and save any field headers (usually just content-type for 
>>> files)
>>> 
>>>       while { ![eof $fp] } {
>>>           set line [string trim [gets $fp]]
>>>           if { ![string length $line] } {
>>>               break
>>>           }
>>>           set header [split $line :]
>>>           set key [string tolower [string trim [lindex $header 0]]]
>>>           set value [string trim [lindex $header 1]]
>>> 
>>>           ns_set put $form $name.$key $value
>>>       }
>>> 
>>>       if { [llength $disposition] == 3 } {
>>>           # uploaded file -- save the original filename as the value
>>>           set filename [string trim [lindex [split [lindex $disposition 2] 
>>> =] 1] \"]
>>>           ns_set put $form $name $filename
>>> 
>>>           # read lines of data until another boundary is found
>>>           set start [tell $fp]
>>>           set end $start
>>> 
>>>           while { ![eof $fp] } {
>>>               if { [string match $boundary* [string trim [gets $fp]]] } {
>>>                   break
>>>               }
>>>               set end [tell $fp]
>>>           }
>>>           set length [expr $end - $start - 2]
>>> 
>>>           # create a temp file for the content, which will be deleted
>>>           # when the connection close.  ns_openexcl can fail, hence why
>>>           # we keep spinning
>>> 
>>>           set tmp ""
>>>           while { $tmp == "" } {
>>>               set tmpfile [ns_tmpnam]
>>>               set tmp [ns_openexcl $tmpfile]
>>>           }
>>> 
>>>           catch {fconfigure $tmp -encoding binary -translation binary}
>>> 
>>>           if { $length > 0 } {
>>>               seek $fp $start
>>>               ns_cpfp $fp $tmp $length
>>>           }
>>> 
>>>           close $tmp
>>>           seek $fp $end
>>>           ns_set put $form $name.tmpfile $tmpfile
>>> 
>>>            if { [ns_conn isconnected] } {
>>>               ns_atclose "ns_unlink -nocomplain $tmpfile"
>>>            }
>>> 
>>>       } else {
>>>           # ordinary field - read lines until next boundary
>>>           set first 1
>>>           set value ""
>>>           set start [tell $fp]
>>> 
>>>           while { [gets $fp line] >= 0 } {
>>>               set line [string trimright $line \r]
>>>               if { [string match $boundary* $line] } {
>>>                   break
>>>               }
>>>               if { $first } {
>>>                   set first 0
>>>               } else {
>>>                   append value \n
>>>               }
>>>               append value $line
>>>               set start [tell $fp]
>>>           }
>>>           seek $fp $start
>>>           ns_set put $form $name $value
>>>       }
>>>    }
>>>    close $fp
>>> }
>>> 
>> 
>> 
>> --
>> AOLserver - http://www.aolserver.com/
>> 
>> To Remove yourself from this list, simply send an email to 
>> <[email protected]> with the
>> body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
>> field of your email blank.
>> 
> 
> 
> --
> AOLserver - http://www.aolserver.com/
> 
> To Remove yourself from this list, simply send an email to 
> <[email protected]> with the
> body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
> field of your email blank.

--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
<[email protected]> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.

Re: [AOLSERVER] large file uploads

Reply via email to