jorisvandenbossche commented on a change in pull request #12543:
URL: https://github.com/apache/arrow/pull/12543#discussion_r829124562



##########
File path: python/pyarrow/_csv.pyx
##########
@@ -546,7 +641,114 @@ cdef class ConvertOptions(_Weakrefable):
         produce a column of nulls (whose type is selected using
         `column_types`, or null by default).
         This option is ignored if `include_columns` is empty.
+
+    Example
+    -------
+
+    Defining an example file from bytes object:
+
+    >>> import io
+    >>> s = b'''1,2,3
+    ... Flamingo,2,01/03/2022
+    ... Horse,4,02/03/2022
+    ... Brittle stars,5,03/03/2022
+    ... Centipede,100,04/03/2022'''
+
+    Define date parsing format to get a timestamp type column
+    (in case dates are not in ISO format and not converted by default):
+
+    >>> convert_options = csv.ConvertOptions(
+    ...                   timestamp_parsers=["%m/%d/%Y", "%d/%m/%Y"])
+    >>> csv.read_csv(io.BytesIO(s), convert_options=convert_options)
+    pyarrow.Table
+    animals: string
+    n_legs: int64
+    entry: timestamp[s]
+    ----
+    animals: [["Flamingo","Horse","Brittle stars","Centipede"]]
+    n_legs: [[2,4,5,100]]
+    entry: [[2022-01-03 00:00:00,2022-02-03 00:00:00,
+    2022-03-03 00:00:00,2022-04-03 00:00:00]]
+
+    Specify which columns to read and add an additional column:
+
+    >>> convert_options = csv.ConvertOptions(
+    ...                   include_columns=["animals", "location"],
+    ...                   include_missing_columns=True)
+    >>> csv.read_csv(io.BytesIO(s), convert_options=convert_options)
+    pyarrow.Table
+    animals: string
+    location: null
+    ----
+    animals: [["Flamingo","Horse","Brittle stars","Centipede"]]
+    location: [4 nulls]
+
+    Define a column as a dictionary:
+
+    >>> convert_options = csv.ConvertOptions(
+    ...                   include_columns=["animals"],

Review comment:
       Leaving out this selection for this example, it will show that the dict 
encoding only happens for the string column and not for the numerical column

##########
File path: python/pyarrow/_csv.pyx
##########
@@ -546,7 +641,114 @@ cdef class ConvertOptions(_Weakrefable):
         produce a column of nulls (whose type is selected using
         `column_types`, or null by default).
         This option is ignored if `include_columns` is empty.
+
+    Example
+    -------
+
+    Defining an example file from bytes object:
+
+    >>> import io
+    >>> s = b'''1,2,3
+    ... Flamingo,2,01/03/2022
+    ... Horse,4,02/03/2022
+    ... Brittle stars,5,03/03/2022
+    ... Centipede,100,04/03/2022'''
+
+    Define date parsing format to get a timestamp type column
+    (in case dates are not in ISO format and not converted by default):
+
+    >>> convert_options = csv.ConvertOptions(
+    ...                   timestamp_parsers=["%m/%d/%Y", "%d/%m/%Y"])
+    >>> csv.read_csv(io.BytesIO(s), convert_options=convert_options)
+    pyarrow.Table
+    animals: string
+    n_legs: int64
+    entry: timestamp[s]
+    ----
+    animals: [["Flamingo","Horse","Brittle stars","Centipede"]]
+    n_legs: [[2,4,5,100]]
+    entry: [[2022-01-03 00:00:00,2022-02-03 00:00:00,
+    2022-03-03 00:00:00,2022-04-03 00:00:00]]
+
+    Specify which columns to read and add an additional column:
+
+    >>> convert_options = csv.ConvertOptions(
+    ...                   include_columns=["animals", "location"],
+    ...                   include_missing_columns=True)
+    >>> csv.read_csv(io.BytesIO(s), convert_options=convert_options)
+    pyarrow.Table
+    animals: string
+    location: null
+    ----
+    animals: [["Flamingo","Horse","Brittle stars","Centipede"]]
+    location: [4 nulls]
+
+    Define a column as a dictionary:
+
+    >>> convert_options = csv.ConvertOptions(
+    ...                   include_columns=["animals"],
+    ...                   auto_dict_encode=True)
+    >>> csv.read_csv(io.BytesIO(s), convert_options=convert_options)
+    pyarrow.Table
+    animals: dictionary<values=string, indices=int32, ordered=0>
+    ----
+    animals: [  -- dictionary:
+    ["Flamingo","Horse","Brittle stars","Centipede"]  -- indices:
+    [0,1,2,3]]
+
+    Set upper limit for the number of categories. If the categories
+    is more than the limit, the conversion to dictionary will not
+    happen:
+
+    >>> convert_options = csv.ConvertOptions(
+    ...                   include_columns=["animals"],
+    ...                   auto_dict_encode=True,
+    ...                   auto_dict_max_cardinality=2)
+    >>> csv.read_csv(io.BytesIO(s), convert_options=convert_options)
+    pyarrow.Table
+    animals: string
+    ----
+    animals: [["Flamingo","Horse","Brittle stars","Centipede"]]
+
+    Define strings that should be set to missing:
+
+    >>> convert_options = csv.ConvertOptions(include_columns=["animals"],
+    ...                                      strings_can_be_null = True,
+    ...                                      null_values=["Horse"])
+    >>> csv.read_csv(io.BytesIO(s), convert_options=convert_options)
+    pyarrow.Table
+    animals: string
+    ----
+    animals: [["Flamingo",null,"Brittle stars","Centipede"]]
+
+    Define values to be True and False when converting a column
+    into a bool type:
+
+    >>> convert_options = csv.ConvertOptions(
+    ...                   include_columns=["animals"],
+    ...                   false_values=["Flamingo","Horse"],
+    ...                   true_values=["Brittle stars","Centipede"])
+    >>> csv.read_csv(io.BytesIO(s), convert_options=convert_options)
+    pyarrow.Table
+    animals: bool
+    ----
+    animals: [[false,false,true,true]]
+
+    Change the type of a column:

Review comment:
       Suggestion to move this example towards the beginning of this set of 
examples, as I think specifying the column type might be one of the most 
typical things to do.

##########
File path: python/pyarrow/_csv.pyx
##########
@@ -546,7 +641,114 @@ cdef class ConvertOptions(_Weakrefable):
         produce a column of nulls (whose type is selected using
         `column_types`, or null by default).
         This option is ignored if `include_columns` is empty.
+
+    Example
+    -------
+
+    Defining an example file from bytes object:
+
+    >>> import io
+    >>> s = b'''1,2,3

Review comment:
       same here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to