zeroshade commented on code in PR #34558:
URL: https://github.com/apache/arrow/pull/34558#discussion_r1135724467
##########
go/arrow/csv/reader.go:
##########
@@ -469,6 +469,10 @@ func (r *Reader) initFieldConverter(bldr array.Builder)
func(string) {
return func(s string) {
r.parseList(bldr, s)
}
+ case arrow.BinaryDataType:
+ return func(s string) {
+ r.parseBinaryDataType(bldr, s)
+ }
Review Comment:
Can you add a case for `*arrow.LargeStringType` above this (such as with the
string case) so that it doesn't get dropped into this case too? (I'm pretty
sure the LargeString type does meet this interface)
##########
go/arrow/csv/reader.go:
##########
@@ -469,6 +469,10 @@ func (r *Reader) initFieldConverter(bldr array.Builder)
func(string) {
return func(s string) {
r.parseList(bldr, s)
}
+ case arrow.BinaryDataType:
+ return func(s string) {
+ r.parseBinaryDataType(bldr, s)
Review Comment:
Like the string case, we probably want to check the `r.stringsCanBeNull`
value to determine whether we want to check for nulls or just let it be empty
values etc.
##########
go/arrow/csv/transformer.go:
##########
@@ -215,6 +216,15 @@ func (w *Writer) transformColToStringArr(typ
arrow.DataType, col arrow.Array) []
res[i] = w.nullValue
}
}
+ case arrow.BinaryDataType:
Review Comment:
same as above, make sure that `LargeString` is covered above this case so it
doesn't fall into here.
##########
go/arrow/csv/reader.go:
##########
@@ -755,6 +759,18 @@ func (r *Reader) parseList(field array.Builder, str
string) {
}
}
+func (r *Reader) parseBinaryDataType(field array.Builder, str string) {
+ if r.isNull(str) {
+ field.AppendNull()
+ return
+ }
+ decodedVal, err := base64.StdEncoding.DecodeString(str)
Review Comment:
do we want to check both the padded and non-padded encodings? (ie: both
`StdEncoding` and `RawStdEncoding`, checking the other if the first one errors)
If we don't check both encodings, we should make sure the documentation
states which we use.
##########
go/arrow/csv/testdata/types.csv:
##########
@@ -15,7 +15,7 @@
# specific language governing permissions and limitations
# under the License.
#
-## supported types:
bool;int8;int16;int32;int64;uint8;uint16;uint32;uint64;float32;float64;string;timestamp
-true;-1;-1;-1;-1;1;1;1;1;1.1;1.1;str-1;2022-05-09T00:01:01;{1,2,3}
-false;-2;-2;-2;-2;2;2;2;2;2.2;2.2;str-2;2022-05-09T23:59:59;{}
-null;NULL;null;N/A;;null;null;null;null;null;null;null;null;null
\ No newline at end of file
+## supported types:
bool;int8;int16;int32;int64;uint8;uint16;uint32;uint64;float32;float64;string;timestamp;list(i64);binary
+true;-1;-1;-1;-1;1;1;1;1;1.1;1.1;str-1;2022-05-09T00:01:01;{1,2,3};AAEC
+false;-2;-2;-2;-2;2;2;2;2;2.2;2.2;str-2;2022-05-09T23:59:59;{};AwQF
+null;NULL;null;N/A;;null;null;null;null;null;null;null;null;null;null
Review Comment:
Can we make sure there's a test case for testing an empty value in addition
to a null value?
##########
go/arrow/csv/transformer.go:
##########
@@ -215,6 +216,15 @@ func (w *Writer) transformColToStringArr(typ
arrow.DataType, col arrow.Array) []
res[i] = w.nullValue
}
}
+ case arrow.BinaryDataType:
+ arr := col.(*array.Binary)
Review Comment:
you can't assume this, it could also be `*array.LargeBinary` or otherwise.
If you don't want to duplicate the code and create cases for both `LargeBinary`
and `Binary` separately, you can create and use an interface which contains
`Value(int) []byte` which would cover both `Binary` and `LargeBinary` types.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]