Hello David and I have been discussing for the past week one of the consequences of QUrl operating on encoded data only in Qt 5. There are a few use-cases where a fully-decoded path is necessary.
== Rationale =(skip to proposal if you find this lengthy) I've already had to implement the full decoding so that QUrl::toLocalFile() would work. But the same process might be necessary for non-local files. For example, from qnetworkaccessftpbackend.cpp: if (operation() == QNetworkAccessManager::GetOperation) { setCachingEnabled(true); ftp->get(url().path(), 0, type); } else { ftp->put(uploadDevice, url().path(), type); } If the URL contained a percent-encoded character that QUrl::path() doesn't decode, that will remain in the path and sent to the FTP server. More than likely, it's not what was intended. The characters that the QUrl does not decode under any circumstances are: - control characters between 0x00 and 0x1F - the percent character itself (0x25) - the backspace control character (0x7F) - high-bit byte sequences that cannot be decoded as UTF-8 Especially because of the last category, the percent sign can never be decoded. Those arbitrary binary sequences can appear anywhere in the URL's user info, path, query or fragment, and the code dealing with them is common. Moreover, encoded paths are the correct way to deal with paths when dealing with a URL's most common use: HTTP and the web. (as a twist of fate, the HTTP backend doesn't use QUrl::path(), but QUrl::toString(QUrl::RemoveAuthority | QUrl::RemoveFragment) so it gets both the path and the query) The same applies to setting the path. Often, the data comes in a decoded form from other contexts, such as user input or an FTP directory listing. For those, encoding is necessary, like QUrl::fromLocalFile does. url.setPath(deslashified.replace(QLatin1Char('%'), QStringLiteral("%25"))); As David pointed out in an email to me, no one who didn't get a full URL training will be able to write the code properly. == Proposal 1 = Add QUrl::decodedPath() and QUrl::setDecodedPath(), operating on QString, which do the necessary encoding and decoding. QUrl::fromLocalPath will instead call that function instead of doing the work above, and QUrl::toLocalPath's extra decoder will be moved to the new function. The documentation will need to be updated to indicate when to use each. == Problem 2 = The same problem that applies to the path can potentially apply to other components of the URL: user name, password, fragment and query. For example, imagine using the following random-generated password (I generated using KeePassX): url.setPassword("}}>b9o%kR("); The above will trigger the tolerant-mode's corrector and will transform the '%' into "%25". However, when trying to send the password to the server, for example using QAuthenticator, we might make this mistake (copied from qnetworkaccessmanager.cpp): // if credentials are included in the url, then use them if (!url.userName().isEmpty() && !url.password().isEmpty()) { authenticator->setUser(url.userName()); authenticator->setPassword(url.password()); [by the way, this code should test if !userInfo().isEmpty(), to catch empty passwords too] Then we ended up setting the password to "}}>b9o%25kR(", which is very likely to be incorrect. == Proposal 2 = So instead of adding decodedPath(), decodedUserName(), decodedPassword(), etc. and cluttering the Qt5 QUrl API like the Qt4 one was, there's a separate proposal: - add an option to QUrl::ComponentFormattingOptions to execute full decoding - add a new value to QUrl::ParsingMode to indicate full decoded parsing - modify all setters so that they take QUrl::ParsingMode too (like QUrl::setUrl) These new options should not be allowed in QUrl's constructor, QUrl::setUrl, QUrl::url, toString and toEncoded, for which full decoding creates ambiguous data (the root flaw in QUrl in Qt 4). Pros over proposal 1: - less API clutter - centralised handling of the decoding and encoding - also allows for StrictMode setting of components and error reporting Cons over proposal 1: - less discoverable and harder to document that the option is needed in cases like the FTP one above. Which one shall it be? -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center Intel Sweden AB - Registration Number: 556189-6027 Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development