This depends on my patch 0015.
Since CSV escaping was entirely broken before that patch (however we decide to fix the problem), let's also fix the escaping syntax itself, without worrying about backwards compatibility.

I tried to solve this according to Rob's comment on https://fedorahosted.org/freeipa/ticket/2227:
> Whatever we come up with we need a more consistent way to handle
> escape characters so it works the same on single and multi-value
> attributes.

My goal is that it will just do the right thing for 99.9% of uses, and the 0.1% (or people who want to write robust scripts¹) should find a short, clear paragraph in the documentation.

Of course, values without backslashes, double quotes and newlines are again completely unaffected.



Commit message:

Instead of the non-obvious semantics of Python's CSV module, just split on commas that aren't escaped by a backslash. Then, replace any commas that *are* escaped by backslashes by just commas, and remove initial whitespace as needed.


This simpler approach fits our use case better: most importantly, it doesn't require literal backslashes to be escaped, and doesn't "eat" backslashes (unless they precede a comma). This means the CSV multivalue format is, except for commas, the same as for single values. It also does away with CSV's double-quoting syntax, making the format more predictable (=scriptable).

A drawback is that this format doesn't allow values that end with a backslash, unless they're given at the end. The workaround is to either give them at the end, or if more are needed,just use multiple options:
   --someopt='val1,val2\' --someopt='val3\'

Values that end with backslashes are so rare, this shouldn't be a problem in practice.


Also, update the tests.


Fixed as a side effect:
https://fedorahosted.org/freeipa/ticket/2402
https://fedorahosted.org/freeipa/ticket/2417

--
Petr³


---
¹ These should really use JSON/XMLRPC directly :)
From fe0d95a4c21d6889db3684f1d1d148ec1b40566d Mon Sep 17 00:00:00 2001
From: Petr Viktorin <pvikt...@redhat.com>
Date: Mon, 27 Feb 2012 08:08:35 -0500
Subject: [PATCH] Simplify CSV escaping syntax

Instead of the non-obvious semantics of Python's CSV module, just split
on commas that aren't escaped by a backslash.
Then, replace any commas that *are* escaped by backslashes by just
commas, and remove initial whitespace as needed.

This simpler approach fits our use case better: most importantly, it
doesn't require literal backslashes to be escaped, and doesn't "eat"
backslashes (unless they precede a comma). This means the CSV
multivalue format is, except for commas, the same as for single values.
It also does away with CSV's double-quoting syntax, making the format
more predictable (=scriptable).

A drawback is that this format doesn't allow values that end with a
backslash, unless they're given at the end. The workaround is to either
give them at the end, or if more are needed,just use multiple options:
   --someopt='val1,val2\' --someopt='val3\'

Values that end with backslashes are so rare, this shouldn't be a
problem in practice.

Also, update the tests.

Fixed as a side effect:
https://fedorahosted.org/freeipa/ticket/2402
https://fedorahosted.org/freeipa/ticket/2417
---
 ipalib/parameters.py                 |   42 ++++++++++++++-------------------
 tests/test_ipalib/test_parameters.py |   22 ++++++-----------
 2 files changed, 26 insertions(+), 38 deletions(-)

diff --git a/ipalib/parameters.py b/ipalib/parameters.py
index cac3441506bdea65cc1fcf9c85164f56d1618808..7912d0e46f2f7af9266dbd99894a7256953e5213 100644
--- a/ipalib/parameters.py
+++ b/ipalib/parameters.py
@@ -679,23 +679,6 @@ class Param(ReadOnly):
         kw.update(overrides)
         return klass(name, *self.rules, **kw)
 
-    # The following 2 functions were taken from the Python
-    # documentation at http://docs.python.org/library/csv.html
-    def __utf_8_encoder(self, unicode_csv_data):
-        for line in unicode_csv_data:
-            yield line.encode('utf-8')
-
-    def __unicode_csv_reader(self, unicode_csv_data, dialect=csv.excel, **kwargs):
-        # csv.py doesn't do Unicode; encode temporarily as UTF-8:
-        csv_reader = csv.reader(self.__utf_8_encoder(unicode_csv_data),
-                                dialect=dialect,
-                                delimiter=self.csv_separator, escapechar='\\',
-                                skipinitialspace=self.csv_skipspace,
-                                **kwargs)
-        for row in csv_reader:
-            # decode UTF-8 back to Unicode, cell by cell:
-            yield [unicode(cell, 'utf-8') for cell in row]
-
     def split_csv(self, value):
         """Split CSV strings into individual values.
 
@@ -719,14 +702,25 @@ class Param(ReadOnly):
         if self.csv:
             if type(value) not in (tuple, list):
                 value = (value,)
-            newval = ()
+            # Split on a comma that is not preceded by a backslash
+            # This means backslashes are not "eaten" unless they precede a
+            # comma.
+            # To include a backslash as the last character, simply supply
+            # that value last.
+            # ("comma" is of course configurable)
+            separator = self.csv_separator
+            separator_re = r'(?<!\\)' + re.escape(separator)
+            newval = []
             for v in value:
-                if isinstance(v, basestring):
-                    csvreader = self.__unicode_csv_reader([unicode(v)])
-                    newval += tuple(csvreader.next()) #pylint: disable=E1101
-                else:
-                    newval += (v,)
-            return newval
+                 if isinstance(v, basestring):
+                    for part in re.split(separator_re, v):
+                        part = part.replace('\\' + separator, separator)
+                        if self.csv_skipspace:
+                            part = part.lstrip()
+                        newval.append(part)
+                 else:
+                    newval.append(v)
+            return tuple(newval)
         else:
             return value
 
diff --git a/tests/test_ipalib/test_parameters.py b/tests/test_ipalib/test_parameters.py
index 83c33ddfce073d3aa90691b62cab30f30ffcee1e..425ed95559985904cb6ff9ba1fe78e412a969cab 100644
--- a/tests/test_ipalib/test_parameters.py
+++ b/tests/test_ipalib/test_parameters.py
@@ -641,12 +641,10 @@ class test_Param(ClassChecker):
         """
         o = self.cls('my_list+', csv=True)
         n = o.split_csv('a,b')
-        assert type(n) is tuple
-        assert len(n) is 2
+        assert n == ('a', 'b')
 
-        n = o.split_csv('bar,   "hi, there",foo')
-        assert type(n) is tuple
-        assert len(n) is 3
+        n = o.split_csv('bar,   hi\, there,foo')
+        assert n == ('bar', 'hi, there', 'foo')
 
     def test_split_csv_separator(self):
         """
@@ -655,12 +653,10 @@ class test_Param(ClassChecker):
         o = self.cls('my_list+', csv=True, csv_separator='|')
 
         n = o.split_csv('a')
-        assert type(n) is tuple
-        assert len(n) is 1
+        assert n == ('a',)
 
         n = o.split_csv('a|b')
-        assert type(n) is tuple
-        assert len(n) is 2
+        assert n == ('a', 'b')
 
     def test_split_csv_skipspace(self):
         """
@@ -669,13 +665,11 @@ class test_Param(ClassChecker):
         o = self.cls('my_list+', csv=True, csv_skipspace=False)
 
         n = o.split_csv('a')
-        assert type(n) is tuple
-        assert len(n) is 1
+        assert n == ('a',)
 
-        n = o.split_csv('a, "b,c", d')
+        n = o.split_csv('a, b\,c, d')
-        assert type(n) is tuple
-        # the output w/o skipspace is ['a',' "b','c"',' d']
-        assert len(n) is 4
+        assert n == ('a', ' b,c', ' d')
 
 
 class test_Flag(ClassChecker):
-- 
1.7.7.6

_______________________________________________
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel

Reply via email to