Re: [racket-users] Missing request-post-data/raw (from web-server/http)

2017-06-30 Thread Philip McGrath
Thanks, Jay. It is definitely POST, and there is a Content-Length header,
so it seems like the problem is indeed #3. I was expecting the raw data to
be there even if it had been parsed — I believe the POST data of #
"corpus=austen=corpus.CorpusMetadata" was also parsed into bindings
(though not from multipart, obviously).

So it sounds like what I'll need to do is detect when this situation is
happening — I guess that would be when the method is POST, the
request-post-data/raw
is #f, and there are some bindings — and convert the bindings back into
multipart form data to give to http-sendrecv/url.

-Philip

On Fri, Jun 30, 2017 at 8:20 AM, Jay McCarthy 
wrote:

> Hi Philip,
>
> I don't necessarily know the answer and it's possible that it is an
> error. I'll explain what it is doing and maybe that will help us move
> forward.
>
> 1) The request-bindings/raw is just an abstraction over
> request-post-data/raw (and the URI)
> 2) The request-post-data/raw is always #f for GETs, are you sure they are
> POSTs?
> 3) POSTs with multipart form data are converted into a
> request-bindings and the raw data is not made available, un-parsed.
> 4) If there's no Content-Length header, then even if there is data,
> then it is not exposed.
>
> I think that your problem may be (3). It sounds like you expect to see
> a copy of the raw data of the request all the time even if it has been
> parsed. (The logic of the current behavior is that at the
> "application" level there is no POST data, but there is only form
> data, but because of "transport" level constraints on the length of
> URIs it had to be sent in the data part of the transport layer.)
>
> Jay
>
>
> On Thu, Jun 29, 2017 at 9:08 PM, Philip McGrath
>  wrote:
> > I'm working on a Racket web application for which I need to proxy certain
> > requests to a non-Racket service over HTTP. I've built a very basic
> proxy on
> > top of http-sendrecv/url that works quite well for the most part.
> >
> > For POST requests, I pass the request-post-data/raw of the original
> request
> > as the #:data argument of http-sendrecv/url.
> >
> > However, I've discovered that certain POST requests (specifically
> involving
> > file uploads) are not working as expected. On these requests, Chrome
> reports
> > that it is performing a request with a header
> > Content-Type:multipart/form-data;
> > boundary=WebKitFormBoundaryAJOgATwBujJhhtbY and a payload as
> follows:
> >
> > --WebKitFormBoundaryAJOgATwBujJhhtbY
> > Content-Disposition: form-data; name="tool"
> > corpus.CorpusCreator
> > --WebKitFormBoundaryAJOgATwBujJhhtbY
> > Content-Disposition: form-data; name="palette"
> > default
> > --WebKitFormBoundaryAJOgATwBujJhhtbY
> > Content-Disposition: form-data; name="textarea-1014-inputEl"
> > Type in one or more URLs on separate lines or paste in a full text.
> > --WebKitFormBoundaryAJOgATwBujJhhtbY
> > Content-Disposition: form-data; name="upload"; filename="tmp-file.txt"
> > Content-Type: text/plain
> > --WebKitFormBoundaryAJOgATwBujJhhtbY--
> >
> >
> > However, at the Racket level, request-post-data/raw returns #f for these
> > requests — but, adding to my confusion, the bindings still show up in
> > request-bindings/raw.
> >
> > Why doesn't this content show up in request-post-data/raw? Is there a
> way to
> > access the raw, original data for these requests, or do I need to somehow
> > reconstruct it from the bindings?
> >
> > Thanks very much,
> > Philip
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Racket Users" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to racket-users+unsubscr...@googlegroups.com.
> > For more options, visit https://groups.google.com/d/optout.
>
>
>
> --
> -=[ Jay McCarthy   http://jeapostrophe.github.io]=-
> -=[ Associate ProfessorPLT @ CS @ UMass Lowell ]=-
> -=[ Moses 1:33: And worlds without number have I created; ]=-
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Missing request-post-data/raw (from web-server/http)

2017-06-30 Thread Jay McCarthy
Hi Philip,

I don't necessarily know the answer and it's possible that it is an
error. I'll explain what it is doing and maybe that will help us move
forward.

1) The request-bindings/raw is just an abstraction over
request-post-data/raw (and the URI)
2) The request-post-data/raw is always #f for GETs, are you sure they are POSTs?
3) POSTs with multipart form data are converted into a
request-bindings and the raw data is not made available, un-parsed.
4) If there's no Content-Length header, then even if there is data,
then it is not exposed.

I think that your problem may be (3). It sounds like you expect to see
a copy of the raw data of the request all the time even if it has been
parsed. (The logic of the current behavior is that at the
"application" level there is no POST data, but there is only form
data, but because of "transport" level constraints on the length of
URIs it had to be sent in the data part of the transport layer.)

Jay


On Thu, Jun 29, 2017 at 9:08 PM, Philip McGrath
 wrote:
> I'm working on a Racket web application for which I need to proxy certain
> requests to a non-Racket service over HTTP. I've built a very basic proxy on
> top of http-sendrecv/url that works quite well for the most part.
>
> For POST requests, I pass the request-post-data/raw of the original request
> as the #:data argument of http-sendrecv/url.
>
> However, I've discovered that certain POST requests (specifically involving
> file uploads) are not working as expected. On these requests, Chrome reports
> that it is performing a request with a header
> Content-Type:multipart/form-data;
> boundary=WebKitFormBoundaryAJOgATwBujJhhtbY and a payload as follows:
>
> --WebKitFormBoundaryAJOgATwBujJhhtbY
> Content-Disposition: form-data; name="tool"
> corpus.CorpusCreator
> --WebKitFormBoundaryAJOgATwBujJhhtbY
> Content-Disposition: form-data; name="palette"
> default
> --WebKitFormBoundaryAJOgATwBujJhhtbY
> Content-Disposition: form-data; name="textarea-1014-inputEl"
> Type in one or more URLs on separate lines or paste in a full text.
> --WebKitFormBoundaryAJOgATwBujJhhtbY
> Content-Disposition: form-data; name="upload"; filename="tmp-file.txt"
> Content-Type: text/plain
> --WebKitFormBoundaryAJOgATwBujJhhtbY--
>
>
> However, at the Racket level, request-post-data/raw returns #f for these
> requests — but, adding to my confusion, the bindings still show up in
> request-bindings/raw.
>
> Why doesn't this content show up in request-post-data/raw? Is there a way to
> access the raw, original data for these requests, or do I need to somehow
> reconstruct it from the bindings?
>
> Thanks very much,
> Philip
>
> --
> You received this message because you are subscribed to the Google Groups
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



-- 
-=[ Jay McCarthy   http://jeapostrophe.github.io]=-
-=[ Associate ProfessorPLT @ CS @ UMass Lowell ]=-
-=[ Moses 1:33: And worlds without number have I created; ]=-

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Missing request-post-data/raw (from web-server/http)

2017-06-29 Thread Philip McGrath
Thanks for your comments.

The only legal files to upload in this case are plain text, so I'm not too
worried about size. I'm relying on the web-server libraries to deal with
any malicious attempts to send overwhelmingly large files (if that's a bad
idea, I'd definitely appreciate hearing it!). Other parts of the
application are implemented in #lang web-server, including some access
control logic surrounding the requests that are proxied to the external
service.

With other requests, the  post-data/raw field of the request struct has
been #f only when the method field is #"GET": with POST requests, it has
otherwise (and I thought it always would) contained the raw POST data e.g. #
"corpus=austen=corpus.CorpusMetadata". I thought the bindings from
the bindings/raw-promise field were simply an abstraction over the
post-data/raw (and/or query part of the uri field), which is why I'm
confused that this POST request has bindings, but has #f for its
post-data/raw.

-Philip

On Thu, Jun 29, 2017 at 9:44 PM, Neil Van Dyke  wrote:

> I don't know the answer to your particular questions with `web-server`
> (I've made my own implementations of this in the past), and these comments
> might not apply to your particular application, but I'll mention here for
> whomever is interested...
>
> It sounds like you're using this, which might preempt your question:
>
> post-data/raw : (or/c false/c bytes?)
>>
>
> Does your application permit a large file upload (an uploaded DVD-ROM
> ".iso" file, like for a Linux distro install disc 1, is typically a few
> gigabytes, and video files can also get huge), and is your program
> (including libraries it uses) going to try to allocate gigabytes at a time
> just for one HTTP request?
>
> If the `POST` data is potentially huge, you might want to think about
> doing stream reading of it (i.e., not sucking it all into memory before you
> do something with it), and sending blocks out your proxy approximately as
> soon as they come in (without buffering too much).  That can make your
> program more robust, lower latency, and maybe even improve overall speed.
>
> Or, if you want to keep getting a convenient byte string out of the MIME
> parser, and you plan to reject huge `POST` data before it
> accidentally/intentionally DoS's your server, that will probably happen
> either as the HTTP request is being read, or in the MIME multipart parser
> (when the request is in MIME multipart, which `POST` isn't always, and if
> the HTTP code hands off a pretty raw input port to multipart parsing code,
> which it should).  This is because you can't assume that HTTP or part
> headers will tell you the content size before you read the content --
> sometimes you have to read to find the EOF or the MIME boundary string
> kludge.
>
> I think streaming algorithms are usually the way to go for potentially
> huge data.  (Well, until you then get into what I'll call "poetic license"
> situations, in which you know how to do it in streaming, and you know why
> you don't have to stream in this case.)
>
> --
> You received this message because you are subscribed to the Google Groups
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Missing request-post-data/raw (from web-server/http)

2017-06-29 Thread Neil Van Dyke
I don't know the answer to your particular questions with `web-server` 
(I've made my own implementations of this in the past), and these 
comments might not apply to your particular application, but I'll 
mention here for whomever is interested...


It sounds like you're using this, which might preempt your question:


post-data/raw : (or/c false/c bytes?)


Does your application permit a large file upload (an uploaded DVD-ROM 
".iso" file, like for a Linux distro install disc 1, is typically a few 
gigabytes, and video files can also get huge), and is your program 
(including libraries it uses) going to try to allocate gigabytes at a 
time just for one HTTP request?


If the `POST` data is potentially huge, you might want to think about 
doing stream reading of it (i.e., not sucking it all into memory before 
you do something with it), and sending blocks out your proxy 
approximately as soon as they come in (without buffering too much).  
That can make your program more robust, lower latency, and maybe even 
improve overall speed.


Or, if you want to keep getting a convenient byte string out of the MIME 
parser, and you plan to reject huge `POST` data before it 
accidentally/intentionally DoS's your server, that will probably happen 
either as the HTTP request is being read, or in the MIME multipart 
parser (when the request is in MIME multipart, which `POST` isn't 
always, and if the HTTP code hands off a pretty raw input port to 
multipart parsing code, which it should).  This is because you can't 
assume that HTTP or part headers will tell you the content size before 
you read the content -- sometimes you have to read to find the EOF or 
the MIME boundary string kludge.


I think streaming algorithms are usually the way to go for potentially 
huge data.  (Well, until you then get into what I'll call "poetic 
license" situations, in which you know how to do it in streaming, and 
you know why you don't have to stream in this case.)


--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[racket-users] Missing request-post-data/raw (from web-server/http)

2017-06-29 Thread Philip McGrath
I'm working on a Racket web application for which I need to proxy certain
requests to a non-Racket service over HTTP. I've built a very basic proxy
on top of http-sendrecv/url that works quite well for the most part.

For POST requests, I pass the request-post-data/raw of the original request
as the #:data argument of http-sendrecv/url.

However, I've discovered that certain POST requests (specifically involving
file uploads) are not working as expected. On these requests, Chrome
reports that it is performing a request with a header
Content-Type:multipart/form-data;
boundary=WebKitFormBoundaryAJOgATwBujJhhtbY and a payload as follows:

--WebKitFormBoundaryAJOgATwBujJhhtbY
Content-Disposition: form-data; name="tool"
corpus.CorpusCreator
--WebKitFormBoundaryAJOgATwBujJhhtbY
Content-Disposition: form-data; name="palette"
default
--WebKitFormBoundaryAJOgATwBujJhhtbY
Content-Disposition: form-data; name="textarea-1014-inputEl"
Type in one or more URLs on separate lines or paste in a full text.
--WebKitFormBoundaryAJOgATwBujJhhtbY
Content-Disposition: form-data; name="upload"; filename="tmp-file.txt"
Content-Type: text/plain
--WebKitFormBoundaryAJOgATwBujJhhtbY--


However, at the Racket level, request-post-data/raw returns #f for these
requests — but, adding to my confusion, the bindings still show up
in request-bindings/raw.

Why doesn't this content show up in request-post-data/raw? Is there a way
to access the raw, original data for these requests, or do I need to
somehow reconstruct it from the bindings?

Thanks very much,
Philip

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.