n3world commented on a change in pull request #10505:
URL: https://github.com/apache/arrow/pull/10505#discussion_r651865764



##########
File path: cpp/src/arrow/csv/options.cc
##########
@@ -17,11 +17,32 @@
 
 #include "arrow/csv/options.h"
 
+#include <iomanip>
+
 namespace arrow {
 namespace csv {
 
 ParseOptions ParseOptions::Defaults() { return ParseOptions(); }
 
+Status ParseOptions::Validate() const {
+  if (ARROW_PREDICT_FALSE((delimiter < ' ' && delimiter != '\t') || delimiter 
> '~')) {
+    return Status::Invalid(
+        "ParseOptions: delimiter must be a printable ascii char or '\\t': 0x",
+        std::setfill('0'), std::setw(2), std::hex, 
static_cast<uint16_t>(delimiter));

Review comment:
       A lot of the characters before ` ` are terminal control characters and 
the only ascii character after `~` is `del`. While these characters could be 
used in theory for a text file they would be very strange couldn't be displayed 
on a terminal. These are the characters this is excluding 
https://flaviocopes.com/non-printable-ascii-characters/ . Also, I don't think 
the user could specify values like `\n` or `\r` since that would break the 
parser since they have to be used for end of line. Currently the python layer 
requires all character values to be between 1 - 127 inclusive.
   I kept tab because tab separated values is a format which is used.

##########
File path: python/pyarrow/_csv.pyx
##########
@@ -58,6 +58,7 @@ cdef class ReadOptions(_Weakrefable):
         How much bytes to process at a time from the input stream.
         This will determine multi-threading granularity as well as
         the size of individual record batches or table chunks.
+        Minimum valid value for block size is 1KB

Review comment:
       Ooops, I wanted 1KB but tripped over tests. Guess I missed a spot




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to