Hi Tim, As you state, maybe we can wait a little before making all these changes to Wget for lowercase conversions.
Meanwhile, please look at the attached patch. It fixes a potential memory leak in Wget, in the case where lowercase conversion of either cookie_domain or host fails but not both. The patch also fixes the configure file to correctly detect libpsl. On Mon, Jul 21, 2014 at 2:28 AM, Tim Rühsen <[email protected]> wrote: > Am Montag, 21. Juli 2014, 00:58:49 schrieb Darshit Shah: >> On Mon, Jul 7, 2014 at 8:14 PM, Tim Ruehsen <[email protected]> wrote: >> > One more comment / idea. >> > >> > The 'cookie_domain' comes from a HTTP Set-Cookie repsonse header and thus >> > is (must be) toASCII() encoded (=puncode). Of course this has to be >> > checked when normalizing the incoming cookie data. A cookie comain having >> > non-ascii characters should simply be dropped. >> > >> > The whole check only works when 'host' is also in toASCII() (punycode) >> > form. >> > >> > Assuming this, psl_str_to_utf8lower() just reduces to a ASCII lowercase >> > converter. >> > >> > If Wget would convert any domain name input to punycode + lowercase, many >> > conversions would fall away and case-function would not be needed (e.g. >> > calling strcmp instead of strcasecmp, the need to call >> > psl_str_to_utf8lower() would fall away, etc.). >> > >> > What do you think ? >> >> Sounds like an interesting idea to me. Although, how do you suggest we >> go about converting the domain names to lowercase? >> I'm not sure about this, so I confirm first. After running the input >> domain names through toASCII(), can we simply pass the string to >> tolower() to get the lowercase version? > > That depends on the library you use. > > libidn's toASCII() has a built-in lowercase conversion. So the input case does > not matter, the output is always lowercase ASCII. > > Using libidn2, you have to convert to lowercase first yourself (e.g. using > libunistring). The output is of course lowercase ASCII. > > Using libicu, you have to convert to lowercase first yourself (but libicu is > able to do that). The output is of course lowercase ASCII. > > > What I thought of (what I did in Mget), 'normalize' every domain name before > further processing/comparing. 'normalizing' means trimming, percent-decoding, > charset transcoding to UTF-8, toASCII() conversion (with or without prior > lowercasing, depending on the IDN library used). > > Having that, Wget's code just needs strcmp() to compare domains and > $ wget übel.de Übel.de xn--bel-goa.de > should reduce to a download of a single file (xn--bel-goa.de/index.html) > (but maybe it is Wget's policy to explictely download every URL given on the > command line, even if it is always the same !?) > > There is domain name input from the command line (URL's and a few options like > -D/--domains), from local files (-i/--input-file) and from remote files. > > But Darshit, maybe this should have low priority. It is more a kind of 'code > polishing'. I am looking forward to start a Wget version based on a libwget in > the next 6-12 months. Most of the code is already working in the Mget project, > but everything needs polishing (e.g. APi docs and more of Wget functionality, > -k/convert-links implemented last week ;-) And than the day comes to merge > Wget and Mget... if that finds any friends ;-) > >> >> > Tim >> > >> > On Monday 07 July 2014 17:08:48 Darshit Shah wrote: >> >> + if (psl_str_to_utf8lower (cookie_domain, NULL, >> >> NULL,&cookie_domain_lower)> >> > == PSL_SUCCESS && >> > >> >> + psl_str_to_utf8lower (host, NULL, NULL, &host_lower) == >> >> PSL_SUCCESS) >> >> + { >> >> + is_acceptable = psl_is_cookie_domain_acceptable (psl, >> >> host_lower, cookie_domain_lower); >> >> + } >> >> + else >> >> + { >> >> + DEBUGP (("libpsl unable to parse domain name. " >> >> + "Falling back to simple heuristics.\n")); >> >> + goto no_psl; >> >> + } > -- Thanking You, Darshit Shah
From a44841cbe2abe712de84d7413c31fc14b44225a7 Mon Sep 17 00:00:00 2001 From: Darshit Shah <[email protected]> Date: Mon, 21 Jul 2014 13:25:54 +0530 Subject: [PATCH] Fix potential memory leak and libpsl configure --- ChangeLog | 4 ++++ configure.ac | 16 +++++++++------- src/ChangeLog | 5 +++++ src/cookies.c | 5 ++++- 4 files changed, 22 insertions(+), 8 deletions(-) diff --git a/ChangeLog b/ChangeLog index 2bfae67..0cba66d 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,7 @@ +2014-07-21 Darshit Shah <[email protected]> + + * configure.ac: Fix check for Libpsl + 2014-06-28 Giuseppe Scrivano <[email protected]> * cfg.mk (local-checks-to-skip): Remove some checks. diff --git a/configure.ac b/configure.ac index abc92fb..56edc00 100644 --- a/configure.ac +++ b/configure.ac @@ -63,7 +63,14 @@ dnl AC_ARG_WITH(libpsl, AS_HELP_STRING([--without-libpsl], - [disable support for libpsl cookie checking.])) + [disable support for libpsl cookie checking.]), + [], + [AC_SEARCH_LIBS(psl_builtin, psl, + [AC_DEFINE([WITH_LIBPSL], [1], [PSL Support Enabled])], + [AC_MSG_WARN(*** libpsl not found. Falling back to tail matching)]) + ]) +AS_IF([test x$ac_cv_search_psl_builtin != "x-psl"], [ENABLE_PSL=no], + [ENABLE_PSL=yes]) AC_ARG_WITH(ssl, [[ --without-ssl disable SSL autodetection @@ -238,11 +245,6 @@ dnl dnl Checks for libraries. dnl -AS_IF([test x"$with_libpsl" != xno], [ - with_libpsl=yes - AC_CHECK_LIB([psl], [psl_builtin]) -]) - AS_IF([test x"$with_zlib" != xno], [ with_zlib=yes AC_CHECK_LIB(z, compress) @@ -593,7 +595,7 @@ AC_MSG_NOTICE([Summary of build options: Libs: $LIBS SSL: $with_ssl Zlib: $with_zlib - PSL: $with_libpsl + PSL: $ENABLE_PSL Digest: $ENABLE_DIGEST NTLM: $ENABLE_NTLM OPIE: $ENABLE_OPIE diff --git a/src/ChangeLog b/src/ChangeLog index 91eda5f..e668583 100644 --- a/src/ChangeLog +++ b/src/ChangeLog @@ -1,3 +1,8 @@ +2014-07-21 Darshit Shah <[email protected]> + + * cookies.c (check_domain_match): Fix a potential memory leak when checking + cookie domain names + 2014-07-07 Tomas Hozza <[email protected]> * iri.c (locale_to_utf8): Fix checking of iconv_open return code. diff --git a/src/cookies.c b/src/cookies.c index 76301ac..bf872a8 100644 --- a/src/cookies.c +++ b/src/cookies.c @@ -546,9 +546,12 @@ check_domain_match (const char *cookie_domain, const char *host) xfree (cookie_domain_lower); xfree (host_lower); - return true ? (is_acceptable == 1) : false; + return is_acceptable == 1; no_psl: + /* Cleanup the PSL pointers first */ + xfree (cookie_domain_lower); + xfree (host_lower); #endif /* For efficiency make some elementary checks first */ -- 2.0.2
