Hi Tim,

As you state, maybe we can wait a little before making all these
changes to Wget for lowercase conversions.

Meanwhile, please look at the attached patch. It fixes a potential
memory leak in Wget, in the case where lowercase conversion of either
cookie_domain or host fails but not both.
The patch also fixes the configure file to correctly detect libpsl.

On Mon, Jul 21, 2014 at 2:28 AM, Tim Rühsen <[email protected]> wrote:
> Am Montag, 21. Juli 2014, 00:58:49 schrieb Darshit Shah:
>> On Mon, Jul 7, 2014 at 8:14 PM, Tim Ruehsen <[email protected]> wrote:
>> > One more comment / idea.
>> >
>> > The 'cookie_domain' comes from a HTTP Set-Cookie repsonse header and thus
>> > is (must be) toASCII() encoded (=puncode). Of course this has to be
>> > checked when normalizing the incoming cookie data. A cookie comain having
>> > non-ascii characters should simply be dropped.
>> >
>> > The whole check only works when 'host' is also in toASCII() (punycode)
>> > form.
>> >
>> > Assuming this, psl_str_to_utf8lower() just reduces to a ASCII lowercase
>> > converter.
>> >
>> > If Wget would convert any domain name input to punycode + lowercase, many
>> > conversions would fall away and case-function would not be needed (e.g.
>> > calling strcmp instead of strcasecmp, the need to call
>> > psl_str_to_utf8lower() would fall away, etc.).
>> >
>> > What do you think ?
>>
>> Sounds like an interesting idea to me. Although, how do you suggest we
>> go about converting the domain names to lowercase?
>> I'm not sure about this, so I confirm first. After running the input
>> domain names through toASCII(), can we simply pass the string to
>> tolower() to get the lowercase version?
>
> That depends on the library you use.
>
> libidn's toASCII() has a built-in lowercase conversion. So the input case does
> not matter, the output is always lowercase ASCII.
>
> Using libidn2, you have to convert to lowercase first yourself (e.g. using
> libunistring). The output is of course lowercase ASCII.
>
> Using libicu, you have to convert to lowercase first yourself (but libicu is
> able to do that). The output is of course lowercase ASCII.
>
>
> What I thought of (what I did in Mget), 'normalize' every domain name before
> further processing/comparing. 'normalizing' means trimming, percent-decoding,
> charset transcoding to UTF-8, toASCII() conversion (with or without prior
> lowercasing, depending on the IDN library used).
>
> Having that, Wget's code just needs strcmp() to compare domains and
> $ wget übel.de Übel.de xn--bel-goa.de
> should reduce to a download of a single file (xn--bel-goa.de/index.html)
> (but maybe it is Wget's policy to explictely download every URL given on the
> command line, even if it is always the same !?)
>
> There is domain name input from the command line (URL's and a few options like
> -D/--domains), from local files (-i/--input-file) and from remote files.
>
> But Darshit, maybe this should have low priority. It is more a kind of 'code
> polishing'. I am looking forward to start a Wget version based on a libwget in
> the next 6-12 months. Most of the code is already working in the Mget project,
> but everything needs polishing (e.g. APi docs and more of Wget functionality,
> -k/convert-links implemented last week ;-) And than the day comes to merge
> Wget and Mget... if that finds any friends ;-)
>
>>
>> > Tim
>> >
>> > On Monday 07 July 2014 17:08:48 Darshit Shah wrote:
>> >> +  if (psl_str_to_utf8lower (cookie_domain, NULL,
>> >> NULL,&cookie_domain_lower)>
>> > == PSL_SUCCESS &&
>> >
>> >> +      psl_str_to_utf8lower (host, NULL, NULL, &host_lower) ==
>> >> PSL_SUCCESS)
>> >> +    {
>> >> +      is_acceptable = psl_is_cookie_domain_acceptable (psl,
>> >> host_lower, cookie_domain_lower);
>> >> +    }
>> >> +  else
>> >> +    {
>> >> +        DEBUGP (("libpsl unable to parse domain name. "
>> >> +                 "Falling back to simple heuristics.\n"));
>> >> +        goto no_psl;
>> >> +    }
>



-- 
Thanking You,
Darshit Shah
From a44841cbe2abe712de84d7413c31fc14b44225a7 Mon Sep 17 00:00:00 2001
From: Darshit Shah <[email protected]>
Date: Mon, 21 Jul 2014 13:25:54 +0530
Subject: [PATCH] Fix potential memory leak and libpsl configure

---
 ChangeLog     |  4 ++++
 configure.ac  | 16 +++++++++-------
 src/ChangeLog |  5 +++++
 src/cookies.c |  5 ++++-
 4 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 2bfae67..0cba66d 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@
+2014-07-21  Darshit Shah  <[email protected]>
+
+	* configure.ac: Fix check for Libpsl
+
 2014-06-28  Giuseppe Scrivano  <[email protected]>
 
 	* cfg.mk (local-checks-to-skip): Remove some checks.
diff --git a/configure.ac b/configure.ac
index abc92fb..56edc00 100644
--- a/configure.ac
+++ b/configure.ac
@@ -63,7 +63,14 @@ dnl
 
 AC_ARG_WITH(libpsl,
     AS_HELP_STRING([--without-libpsl],
-                   [disable support for libpsl cookie checking.]))
+                   [disable support for libpsl cookie checking.]),
+    [],
+    [AC_SEARCH_LIBS(psl_builtin, psl,
+                    [AC_DEFINE([WITH_LIBPSL], [1], [PSL Support Enabled])],
+                    [AC_MSG_WARN(*** libpsl not found. Falling back to tail matching)])
+    ])
+AS_IF([test x$ac_cv_search_psl_builtin != "x-psl"], [ENABLE_PSL=no],
+      [ENABLE_PSL=yes])
 
 AC_ARG_WITH(ssl,
 [[  --without-ssl           disable SSL autodetection
@@ -238,11 +245,6 @@ dnl
 dnl Checks for libraries.
 dnl
 
-AS_IF([test x"$with_libpsl" != xno], [
-  with_libpsl=yes
-  AC_CHECK_LIB([psl], [psl_builtin])
-])
-
 AS_IF([test x"$with_zlib" != xno], [
   with_zlib=yes
   AC_CHECK_LIB(z, compress)
@@ -593,7 +595,7 @@ AC_MSG_NOTICE([Summary of build options:
   Libs:              $LIBS
   SSL:               $with_ssl
   Zlib:              $with_zlib
-  PSL:               $with_libpsl
+  PSL:               $ENABLE_PSL
   Digest:            $ENABLE_DIGEST
   NTLM:              $ENABLE_NTLM
   OPIE:              $ENABLE_OPIE
diff --git a/src/ChangeLog b/src/ChangeLog
index 91eda5f..e668583 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,8 @@
+2014-07-21  Darshit Shah  <[email protected]>
+
+	* cookies.c (check_domain_match): Fix a potential memory leak when checking
+	cookie domain names
+
 2014-07-07  Tomas Hozza  <[email protected]>
 
 	* iri.c (locale_to_utf8): Fix checking of iconv_open return code.
diff --git a/src/cookies.c b/src/cookies.c
index 76301ac..bf872a8 100644
--- a/src/cookies.c
+++ b/src/cookies.c
@@ -546,9 +546,12 @@ check_domain_match (const char *cookie_domain, const char *host)
   xfree (cookie_domain_lower);
   xfree (host_lower);
 
-  return true ? (is_acceptable == 1) : false;
+  return is_acceptable == 1;
 
 no_psl:
+  /* Cleanup the PSL pointers first */
+  xfree (cookie_domain_lower);
+  xfree (host_lower);
 #endif
 
   /* For efficiency make some elementary checks first */
-- 
2.0.2

Reply via email to