[
https://issues.apache.org/jira/browse/BEAM-7996?focusedWorklogId=466320&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-466320
]
ASF GitHub Bot logged work on BEAM-7996:
----------------------------------------
Author: ASF GitHub Bot
Created on: 04/Aug/20 16:48
Start Date: 04/Aug/20 16:48
Worklog Time Spent: 10m
Work Description: lostluck commented on a change in pull request #12426:
URL: https://github.com/apache/beam/pull/12426#discussion_r465189010
##########
File path:
model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml
##########
@@ -384,3 +384,31 @@ nested: false
examples:
"\x02\x01\x02\x01": {f_bool: True, f_bytes: null}
"\x02\x00\x00\x04ab\x00c": {f_bool: False, f_bytes: "ab\0c"}
+
+---
+
+# Binary data generated with the python SDK:
+#
+# import typing
+# import apache_beam as beam
+# class Test(typing.NamedTuple):
+# f_map: typing.Mapping[str,int]
+# schema = beam.typehints.schemas.named_tuple_to_schema(Test)
+# coder = beam.coders.row_coder.RowCoder(schema)
+# print("payload = %s" % schema.SerializeToString())
+# examples = (Test(f_map={}),
+# Test(f_map={"foo": 9001, "bar": 9223372036854775807}),
+# Test(f_map={"everything": None, "is": None, "null!": None,
"¯\_(ツ)_/¯": None}))
+# for example in examples:
+# print("example = %s" % coder.encode(example))
+coder:
+ urn: "beam:coder:row:v1"
+ # f_map: map<str, nullable int64>
+ payload:
"\n\x15\n\x05f_map\x1a\x0c*\n\n\x02\x10\x07\x12\x04\x08\x01\x10\x04\x12$d8c8f969-14e6-457f-a8b5-62a1aec7f1cd"
+ # map ordering is non-deterministic
+ non_deterministic: True
+nested: false
Review comment:
As it stands, this is confusing for SDK authors writing tests against
standard_coders.yaml, as I've got the go testing written I need to explicitly
ignore the nested field for the row coders because they're all set to
nested:false, rather than nested:true.
This is per my thread on the dev list:
https://lists.apache.org/thread.html/r7da098363e6ce607ce96f9fbedb08f9f4757bedd68846aaeba5dd4f0%40%3Cdev.beam.apache.org%3E
Portability only ever supports nested coders. The semantics of
standard_coders.yaml say that
```
# nested: a boolean meaning whether the coder was used in the nested
context. Missing means to
# test both contexts, a shorthand for when the coder is invariant
across context.
```
https://github.com/apache/beam/blob/587dde57cbb2b0095a1fa04b59798d1b62c66f18/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml#L24
Meaning that nested: false means that the outer most encoding has the length
prefix if necessary.
Structually, there's never a reason for a single schema value to have the
wrapped length prefix (it's orthogonal to this aspect of the encoding, as any
sub component is always nested as needed), so it's not included in the various
payload examples.
So, I re-iterate: Why is nested: false, instead of nested true if the coding
is going to be identical in both context?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 466320)
Time Spent: 4h 40m (was: 4.5h)
> Add support for remaining data types in python RowCoder
> --------------------------------------------------------
>
> Key: BEAM-7996
> URL: https://issues.apache.org/jira/browse/BEAM-7996
> Project: Beam
> Issue Type: New Feature
> Components: sdk-py-core
> Reporter: Brian Hulette
> Assignee: Brian Hulette
> Priority: P2
> Time Spent: 4h 40m
> Remaining Estimate: 0h
>
> In the initial [python RowCoder
> implementation|https://github.com/apache/beam/pull/9188] we only added
> support for the data types that already had coders in the Python SDK. We
> should add support for the remaining data types that are not currently
> supported:
> * INT8 (ByteCoder in Java)
> * INT16 (BigEndianShortCoder in Java)
> * FLOAT (FloatCoder in Java) (Note: doubles are supported, this is
> specifically for single-precision)
> * --BOOLEAN (standard beam:coder:bool:v1, BooleanCoder in Java)--
> * --BYTES (standard beam:coder:bytes:v1, ByteArrayCoder in Java)--
> * Map (MapCoder in Java)
> We might consider making those coders standard so they can be tested
> independently from RowCoder in standard_coders.yaml. Or, if we don't do that
> we should probably add a more robust testing framework for RowCoder itself,
> because it will be challenging to test all of these types as part of the
> RowCoder tests in standard_coders.yaml.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)