On Mon, Dec 14, 2009 at 13:03, Dean Michael Berris
<[email protected]> wrote:
> Hi Jeroen,
>
> On Mon, Dec 14, 2009 at 5:42 PM, Jeroen Habraken <[email protected]> wrote:
>> On Mon, Dec 14, 2009 at 10:20, Glyn Matthews <[email protected]> wrote:
> [snip]
>>
>> I'm currently working on URI, and the HTTP part in specific, trying to
>> make it more strict, RFC compliant. The query and fragments should be
>> working now, the path is still a bit of a pain. I'll keep you up to
>> date, and expect a patch sometime soon :)
>>
>
> Cool, please either fork the library on Github or send a git patch
> later on. I will be freezing the Subversion repository tomorrow.
>
> Have a good day!
>
> --
> Dean Michael Berris
> blog.cplusplus-soup.com | twitter.com/mikhailberis
> linkedin.com/in/mikhailberis | facebook.com/dean.berris | deanberris.com
>
> ------------------------------------------------------------------------------
> Return on Information:
> Google Enterprise Search pays you back
> Get the facts.
> http://p.sf.net/sfu/google-dev2dev
> _______________________________________________
> Cpp-netlib-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/cpp-netlib-devel
>

Hi,

I've decided to roll an initial patch, please find it attached. It
fixes the following:
- stricter RFC compliant parsing of the scheme, in the generic URI
- It converts the scheme to lower case, as it states the following in
the RFC, "For resiliency, programs interpreting URI should treat upper
case letters as equivalent to lower case in scheme names"
- I've changes the parser of the port to use ushort_ and uint16_t, the
RFC specifies the port as *digit, but I think it should be limited to
the valid network ports, thus 0 <= port <= 2**16
- The query and fragment are now parsed conform to the RFC I believe,
I'd like to change this later to parse the query into a
std::list<std::pair<string_type, string_type> >

Note that the way the current parser works, it guarantees that if the
URI is valid, the URI decoding can do with a lot less checks, I don't
know whether this is a good idea though.

Jeroen
diff -ru netlib_0_4/boost/network/uri/detail/parse_uri.hpp netlib_0_4/boost/network/uri/detail/parse_uri.hpp
--- netlib_0_4/boost/network/uri/detail/parse_uri.hpp	2009-12-10 10:28:20.000000000 +0100
+++ netlib_0_4/boost/network/uri/detail/parse_uri.hpp	2009-12-14 13:47:47.000000000 +0100
@@ -26,6 +26,7 @@
                 using spirit::ascii::cntrl;
                 using spirit::ascii::alnum;
                 using spirit::ascii::space;
+                using spirit::ascii::alpha;
                 using namespace spirit::qi::labels;
                 using fusion::tie;
 
@@ -40,7 +41,7 @@
                 bool ok = parse(
                         start_, end_, 
                         (
-                            +((alnum|char_("+.-")) - ':')
+                            (alpha > *(alnum|char_("+.-")))
                          >> ':'
                          >> 
                             +(char_ - (cntrl|space))
diff -ru netlib_0_4/boost/network/uri/http/detail/parse_specific.hpp netlib_0_4/boost/network/uri/http/detail/parse_specific.hpp
--- netlib_0_4/boost/network/uri/http/detail/parse_specific.hpp	2009-12-10 10:28:20.000000000 +0100
+++ netlib_0_4/boost/network/uri/http/detail/parse_specific.hpp	2009-12-14 13:51:49.000000000 +0100
@@ -11,6 +11,8 @@
 #include <boost/network/uri/detail/constants.hpp>
 #include <boost/network/protocol/http/traits/string.hpp>
 
+#include <boost/algorithm/string/case_conv.hpp>
+
 namespace boost { namespace network { namespace uri { 
 
     namespace detail {
@@ -51,6 +53,10 @@
                             uri_parts<tags::http> & parts
                      ) 
             {
+                // For resiliency, programs interpreting URI should treat upper
+                // case letters as equivalent to lower case in scheme names
+                boost::to_lower(parts.scheme);
+
                 // Require that parts.scheme is either http or https
                 if (parts.scheme.size() < 4)
                     return false;
@@ -68,10 +74,13 @@
                 using spirit::ascii::space;
                 using spirit::ascii::alnum;
                 using spirit::ascii::punct;
+                using spirit::ascii::xdigit;
                 using spirit::qi::lexeme;
-                using spirit::qi::uint_;
+                using spirit::qi::ushort_;
                 using spirit::qi::digit;
                 using spirit::qi::rule;
+                using spirit::qi::repeat;
+                using spirit::qi::raw;
                 using fusion::tie;
                 
                 typedef string<tags::http>::type string_type;
@@ -82,7 +91,7 @@
                 fusion::tuple<
                     optional<string_type> &,
                     string_type &,
-                    optional<uint32_t> &,
+                    optional<uint16_t> &,
                     optional<string_type> &,
                     optional<string_type> &,
                     optional<string_type> &
@@ -96,6 +105,10 @@
                                 parts.fragment
                            );
 
+                rule<iterator, string_type::value_type()> reserved = char_(";/?:@&=+$,");
+                rule<iterator, string_type::value_type()> unreserved = alnum | char_("-_.!~*'()");
+                rule<iterator, string_type()> escaped = char_("%") > repeat(2)[xdigit];
+
                 hostname<tags::http>::parser<iterator> hostname;
                 bool ok = parse(
                         start_, end_,
@@ -106,10 +119,10 @@
                             >> '@'
                             ]
                          >> hostname
-                         >> -lexeme[':' >> uint_]
+                         >> -lexeme[':' >> ushort_]
                          >> -lexeme['/' >> *((alnum|punct) - '?')]
-                         >> -lexeme['?' >> *((alnum|punct) - '#')]
-                         >> -lexeme['#' >> *(alnum|punct)]
+                         >> -lexeme['?' >> raw[*(reserved | unreserved | escaped)]]
+                         >> -lexeme['#' >> raw[*(reserved | unreserved | escaped)]]
                         ),
                         result
                         );
diff -ru netlib_0_4/boost/network/uri/http/detail/uri_parts.hpp netlib_0_4/boost/network/uri/http/detail/uri_parts.hpp
--- netlib_0_4/boost/network/uri/http/detail/uri_parts.hpp	2009-12-10 10:28:20.000000000 +0100
+++ netlib_0_4/boost/network/uri/http/detail/uri_parts.hpp	2009-12-14 00:13:54.000000000 +0100
@@ -19,7 +19,7 @@
             string_type scheme_specific_part;
             optional<string_type> user_info;
             string_type host;
-            optional<uint32_t> port;
+            optional<uint16_t> port;
             optional<string_type> path;
             optional<string_type> query;
             optional<string_type> fragment;
diff -ru netlib_0_4/boost/network/uri/http/uri.hpp netlib_0_4/boost/network/uri/http/uri.hpp
--- netlib_0_4/boost/network/uri/http/uri.hpp	2009-12-10 10:28:20.000000000 +0100
+++ netlib_0_4/boost/network/uri/http/uri.hpp	2009-12-14 00:14:03.000000000 +0100
@@ -30,7 +30,7 @@
                 return parts_.host;
             }
 
-            uint32_t port() const {
+            uint16_t port() const {
                 return parts_.port ? *parts_.port : 
                     (parts_.scheme == "https" ? 443u : 80u);
             }
@@ -60,7 +60,7 @@
         }
 
     inline
-        uint32_t
+        uint16_t
         port(basic_uri<tags::http> const & uri) {
             return uri.port();
         }
diff -ru netlib_0_4/boost/network/uri/http/uri_concept.hpp netlib_0_4/boost/network/uri/http/uri_concept.hpp
--- netlib_0_4/boost/network/uri/http/uri_concept.hpp	2009-12-10 10:28:20.000000000 +0100
+++ netlib_0_4/boost/network/uri/http/uri_concept.hpp	2009-12-14 00:14:10.000000000 +0100
@@ -19,7 +19,7 @@
             {
                 string_type user_info_ = user_info(uri);
                 string_type host_ = host(uri);
-                uint32_t port_ = port(uri);
+                uint16_t port_ = port(uri);
                 port_ = 0u;
                 string_type path_ = path(uri);
                 string_type query_ = query(uri);
------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
Cpp-netlib-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/cpp-netlib-devel

Reply via email to