Abacn commented on issue #19817:
URL: https://github.com/apache/beam/issues/19817#issuecomment-1227377405

   > There is a problem of replace all DATETIME fieldtype to logical type, per 
@TheNeuralBit:
   > 
   > > One thing that I realized as I wrote up my PR: If we make DATETIME a 
logical type backed by INT64, then it will have to use VarLongCoder, rather 
than InstantCoder as it currently does.
   > > It seems like that would be a problem since InstantCoder is designed to 
make lexicographic order correspond to chronological order.
   > > Note that InstantCoder is not only used in processing schemas but also 
in coding windows: 
https://github.com/apache/beam/blob/bf39489b2a1fd45e6798483d083e4ad240f66891/sdks/java/core/src/main/java/org/apache/beam/sdk/util/WindowedValue.java#L618
   > > 
   > > Tried to replace the InstantCoder implementation using either a VarInt 
or a ByteArray it breaks window decoding here. Seems like there is some 
predefined stream not using InstantCoder.encode to pack a timestamp.
   > 
   > Thus I decided not to change InstantCoder at this moment, instead extend 
the support MicrosInstant logical type in Java sdk and DateTime logical type in 
python sdk, seems to be the simplest way to make xlang read/write records 
containing timestamp type work.
   
   Managed to get a test case using the following code snippet
   
   ```python
   
   def generate_millis():
       # Logical type that deals with millis_instant urn (MillisInstant)
       MillisLogicalType = 
LogicalType._known_logical_types.get_logical_type_by_urn('beam:logical_type:millis_instant:v1')
       # Original Logical type used to represent Timestamp (MicrosInstant)
       TimestampLogicalType = 
LogicalType._known_logical_types.get_logical_type_by_language_type(Timestamp)
       
       LogicalType._known_logical_types.by_language_type[Timestamp] = 
MillisLogicalType
   
       schema = beam.typehints.schemas.named_tuple_to_schema(TestTuple)
       coder = beam.coders.row_coder.RowCoder(schema)
       print("payload = %s" % schema.SerializeToString())
       examples = (TestTuple(
           f_timestamp=Timestamp.from_rfc3339("2020-08-13T14:14:14.123Z"),
           f_string="2020-08-13T14:14:14.123Z",
           f_int=1597328054123),)
       for example in examples:
           print("example = %s" % coder.encode(example))
       
       # recover original registration
       LogicalType._known_logical_types.by_language_type[Timestamp] = 
TimestampLogicalType
   
   ```
   
   The workaround is temporarily change the mapping of Timestamp -> 
MillisInstant logical type. Without it Timestamp always maps to MicrosInstant 
logical type in Python.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to