[GitHub] [arrow] zeroshade commented on a diff in pull request #34558: feat: Add binary support for CSV

via GitHub Tue, 14 Mar 2023 08:26:47 -0700


zeroshade commented on code in PR #34558:
URL: https://github.com/apache/arrow/pull/34558#discussion_r1135724467



##########
go/arrow/csv/reader.go:
##########
@@ -469,6 +469,10 @@ func (r *Reader) initFieldConverter(bldr array.Builder) 
func(string) {
                return func(s string) {
                        r.parseList(bldr, s)
                }
+       case arrow.BinaryDataType:
+               return func(s string) {
+                       r.parseBinaryDataType(bldr, s)
+               }

Review Comment:
   Can you add a case for `*arrow.LargeStringType` above this (such as with the 
string case) so that it doesn't get dropped into this case too? (I'm pretty 
sure the LargeString type does meet this interface)



##########
go/arrow/csv/reader.go:
##########
@@ -469,6 +469,10 @@ func (r *Reader) initFieldConverter(bldr array.Builder) 
func(string) {
                return func(s string) {
                        r.parseList(bldr, s)
                }
+       case arrow.BinaryDataType:
+               return func(s string) {
+                       r.parseBinaryDataType(bldr, s)

Review Comment:
   Like the string case, we probably want to check the `r.stringsCanBeNull` 
value to determine whether we want to check for nulls or just let it be empty 
values etc.



##########
go/arrow/csv/transformer.go:
##########
@@ -215,6 +216,15 @@ func (w *Writer) transformColToStringArr(typ 
arrow.DataType, col arrow.Array) []
                                res[i] = w.nullValue
                        }
                }
+       case arrow.BinaryDataType:

Review Comment:
   same as above, make sure that `LargeString` is covered above this case so it 
doesn't fall into here.



##########
go/arrow/csv/reader.go:
##########
@@ -755,6 +759,18 @@ func (r *Reader) parseList(field array.Builder, str 
string) {
        }
 }
 
+func (r *Reader) parseBinaryDataType(field array.Builder, str string) {
+       if r.isNull(str) {
+               field.AppendNull()
+               return
+       }
+       decodedVal, err := base64.StdEncoding.DecodeString(str)

Review Comment:
   do we want to check both the padded and non-padded encodings? (ie: both 
`StdEncoding` and `RawStdEncoding`, checking the other if the first one errors)
   
   If we don't check both encodings, we should make sure the documentation 
states which we use.



##########
go/arrow/csv/testdata/types.csv:
##########
@@ -15,7 +15,7 @@
 # specific language governing permissions and limitations
 # under the License.
 #
-## supported types: 
bool;int8;int16;int32;int64;uint8;uint16;uint32;uint64;float32;float64;string;timestamp
-true;-1;-1;-1;-1;1;1;1;1;1.1;1.1;str-1;2022-05-09T00:01:01;{1,2,3}
-false;-2;-2;-2;-2;2;2;2;2;2.2;2.2;str-2;2022-05-09T23:59:59;{}
-null;NULL;null;N/A;;null;null;null;null;null;null;null;null;null
\ No newline at end of file
+## supported types: 
bool;int8;int16;int32;int64;uint8;uint16;uint32;uint64;float32;float64;string;timestamp;list(i64);binary
+true;-1;-1;-1;-1;1;1;1;1;1.1;1.1;str-1;2022-05-09T00:01:01;{1,2,3};AAEC
+false;-2;-2;-2;-2;2;2;2;2;2.2;2.2;str-2;2022-05-09T23:59:59;{};AwQF
+null;NULL;null;N/A;;null;null;null;null;null;null;null;null;null;null

Review Comment:
   Can we make sure there's a test case for testing an empty value in addition 
to a null value?



##########
go/arrow/csv/transformer.go:
##########
@@ -215,6 +216,15 @@ func (w *Writer) transformColToStringArr(typ 
arrow.DataType, col arrow.Array) []
                                res[i] = w.nullValue
                        }
                }
+       case arrow.BinaryDataType:
+               arr := col.(*array.Binary)

Review Comment:
   you can't assume this, it could also be `*array.LargeBinary` or otherwise. 
If you don't want to duplicate the code and create cases for both `LargeBinary` 
and `Binary` separately, you can create and use an interface which contains 
`Value(int) []byte` which would cover both `Binary` and `LargeBinary` types.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] zeroshade commented on a diff in pull request #34558: feat: Add binary support for CSV

Reply via email to