Package: presage
Version: 0.8.8-1ubuntu3
Severity: normal
Tags: upstream patch

Dear Maintainer,

 Currently presage splits on apostrophes and is unable to represent words
containing apostrophes in the database (due to them not being escaped). This
results in presage being unable to correctly predict words like "don't".

 Viewing the database for the English predictions shows that this is being
represented a 2-gram of: "don" and "t".

 The expected result is that this would be represented as a 1-gram of: "don't".

 I realise you're also upstream developer, so have seen my upstream bug report
for this already. Basically we're planning on including this patch temporarily
in the Ubuntu presage package and were wondering if you'd be interested in
including it in the Debian package until an upstream solution is ready. If so
we can then just sync our Ubuntu package with your Debian package, otherwise if
you'd rather wait until you've got a more comprehensive upstream solution we'll
just apply the patch temporarily in Ubuntu and then sync once it's fixed
upstream.

Thanks!

P.S.
 I'm still getting to grips with Debian packaging procedures, so apologies if
I've misstepped anywhere!



-- System Information:
Debian Release: jessie/sid
  APT prefers utopic-updates
  APT policy: (500, 'utopic-updates'), (500, 'utopic-security'), (500, 
'utopic'), (100, 'utopic-backports')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.16.0-24-generic (SMP w/4 CPU cores)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages presage depends on:
ii  libc6         2.19-10ubuntu2
ii  libgcc1       1:4.9.1-16ubuntu6
ii  libncurses5   5.9+20140712-2ubuntu1
ii  libpresage1   0.8.8-1ubuntu3
ii  libsqlite3-0  3.8.6-1
ii  libstdc++6    4.9.1-16ubuntu6
ii  libtinfo5     5.9+20140712-2ubuntu1

presage recommends no packages.

presage suggests no packages.

-- no debconf information
Description: Allow words with apostrophes to be predicted
 Stop the tokenizer from splitting based on apostrophes and allow for the
 escaping of words containing apostrophes in the database connector.
Author: Michael Sheldon <[email protected]>
Forwarded: https://sourceforge.net/p/presage/patches/2/
Bug-Ubuntu: https://launchpad.net/bugs/1384800

--- presage-0.9.orig/src/lib/core/charsets.h
+++ presage-0.9/src/lib/core/charsets.h
@@ -180,7 +180,6 @@ const char DEFAULT_SEPARATOR_CHARS[]={
     '$',
     '%',
     '&',
-    '\'',
     '(',
     ')',
     '*',
--- presage-0.9.orig/src/lib/predictors/dbconnector/databaseConnector.cpp
+++ presage-0.9/src/lib/predictors/dbconnector/databaseConnector.cpp
@@ -30,6 +30,7 @@
 #include <sstream>
 #include <stdlib.h>
 #include <assert.h>
+#include <boost/algorithm/string/replace.hpp>
 
 DatabaseConnector::DatabaseConnector(const std::string database_name,
 				     const size_t cardinality,
@@ -293,12 +294,8 @@ std::string DatabaseConnector::buildValu
 
 std::string DatabaseConnector::sanitizeString(const std::string str) const
 {
-    // TODO
-    // just return the string for the time being
-    // REVISIT
-    // TO BE DONE
-    // TBD
-    return str;
+    // Escape single quotes
+    return boost::replace_all_copy(str, "'", "''");
 }
 
 int DatabaseConnector::extractFirstInteger(const NgramTable& table) const
--- presage-0.9.orig/src/tools/text2ngram.cpp
+++ presage-0.9/src/tools/text2ngram.cpp
@@ -174,7 +174,7 @@ int main(int argc, char* argv[])
 	std::ifstream infile(argv[i]);
 	ForwardTokenizer tokenizer(infile,
 				   " \f\n\r\t\v",
-				   "`~!@#$%^&*()_-+=\\|]}[{'\";:/?.>,<");
+				   "`~!@#$%^&*()_-+=\\|]}[{\";:/?.>,<");
 	tokenizer.lowercaseMode(lowercase);
 
 	// take care of first N-1 tokens

Reply via email to