Abacn commented on issue #19817: URL: https://github.com/apache/beam/issues/19817#issuecomment-1227377405
> There is a problem of replace all DATETIME fieldtype to logical type, per @TheNeuralBit: > > > One thing that I realized as I wrote up my PR: If we make DATETIME a logical type backed by INT64, then it will have to use VarLongCoder, rather than InstantCoder as it currently does. > > It seems like that would be a problem since InstantCoder is designed to make lexicographic order correspond to chronological order. > > Note that InstantCoder is not only used in processing schemas but also in coding windows: https://github.com/apache/beam/blob/bf39489b2a1fd45e6798483d083e4ad240f66891/sdks/java/core/src/main/java/org/apache/beam/sdk/util/WindowedValue.java#L618 > > > > Tried to replace the InstantCoder implementation using either a VarInt or a ByteArray it breaks window decoding here. Seems like there is some predefined stream not using InstantCoder.encode to pack a timestamp. > > Thus I decided not to change InstantCoder at this moment, instead extend the support MicrosInstant logical type in Java sdk and DateTime logical type in python sdk, seems to be the simplest way to make xlang read/write records containing timestamp type work. Managed to get a test case using the following code snippet ```python def generate_millis(): # Logical type that deals with millis_instant urn (MillisInstant) MillisLogicalType = LogicalType._known_logical_types.get_logical_type_by_urn('beam:logical_type:millis_instant:v1') # Original Logical type used to represent Timestamp (MicrosInstant) TimestampLogicalType = LogicalType._known_logical_types.get_logical_type_by_language_type(Timestamp) LogicalType._known_logical_types.by_language_type[Timestamp] = MillisLogicalType schema = beam.typehints.schemas.named_tuple_to_schema(TestTuple) coder = beam.coders.row_coder.RowCoder(schema) print("payload = %s" % schema.SerializeToString()) examples = (TestTuple( f_timestamp=Timestamp.from_rfc3339("2020-08-13T14:14:14.123Z"), f_string="2020-08-13T14:14:14.123Z", f_int=1597328054123),) for example in examples: print("example = %s" % coder.encode(example)) # recover original registration LogicalType._known_logical_types.by_language_type[Timestamp] = TimestampLogicalType ``` The workaround is temporarily change the mapping of Timestamp -> MillisInstant logical type. Without it Timestamp always maps to MicrosInstant logical type in Python. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
