mattalbr commented on code in PR #38590:
URL: https://github.com/apache/beam/pull/38590#discussion_r3283960587


##########
sdks/python/apache_beam/coders/coder_impl.py:
##########
@@ -28,16 +28,19 @@
 
 For internal use only; no backwards-compatibility guarantees.
 """
+
 # pytype: skip-file
 
 # ruff: noqa: UP006
 import dataclasses
+import datetime
 import decimal
 import enum
 import itertools
 import json
 import logging
 import pickle
+import zoneinfo

Review Comment:
   Seems legit, done



##########
sdks/python/apache_beam/coders/coder_impl.py:
##########
@@ -441,6 +449,22 @@ def encode_to_stream(self, value, stream, nested):
     elif t is bool:
       stream.write_byte(BOOL_TYPE)
       stream.write_byte(value)
+    elif t is datetime.datetime:
+      # We use RFC 9557 for lossless encoding of timezone info.
+      stream.write_byte(DATETIME_TYPE)
+      stream.write(value.isoformat().encode("utf-8"))
+      if value.tzinfo is not None and type(
+          value.tzinfo) is not datetime.timezone:
+        stream.write(f"[{value.tzinfo}]".encode("utf-8"))
+      if type(
+          value.tzinfo) is datetime.timezone and (tzname :=
+                                                  value.tzname()) is not None:
+        stream.write(f"[tzn={tzname}]".encode("utf-8"))
+      if value.fold != 0:
+        stream.write(f"[f={value.fold}]".encode("utf-8"))
+    elif t is datetime.date:
+      stream.write_byte(DATE_TYPE)
+      stream.write(value.isoformat().encode("utf-8"))

Review Comment:
   nested defaults to False and this matches the style of the other 
implementations which also omit the nested parameter when not needed



##########
sdks/python/apache_beam/coders/coder_impl.py:
##########
@@ -614,6 +645,46 @@ def decode_from_stream(self, stream, nested):
       return v
     elif t == BOOL_TYPE:
       return not not stream.read_byte()
+    elif t == DATETIME_TYPE:
+      rfc_9557_str = stream.read_all(nested).decode("utf-8")
+      first_tag_idx = rfc_9557_str.find("[")
+      if first_tag_idx == -1:
+        return datetime.datetime.fromisoformat(rfc_9557_str)
+
+      base_iso = rfc_9557_str[:first_tag_idx]
+      tags_str = rfc_9557_str[first_tag_idx:]
+      dt = datetime.datetime.fromisoformat(base_iso)
+
+      fold = 0
+      zone_name = None
+      tz_name = None
+
+      tags = tags_str.replace("]", "").split("[")
+      for tag in tags:
+        if not tag:
+          continue
+        if tag.startswith("f="):
+          fold = int(tag[2:])
+        elif tag.startswith("tzn="):
+          tz_name = tag[4:]
+        elif "=" in tag:
+          # Skip unknown tags like [knort=blorgel]
+          continue
+        else:
+          zone_name = tag
+
+      if tz_name and (offset := dt.utcoffset()) is not None:
+        dt = dt.replace(tzinfo=datetime.timezone(offset=offset, name=tz_name))
+      elif zone_name:
+        dt = dt.replace(tzinfo=zoneinfo.ZoneInfo(zone_name))

Review Comment:
   Done



##########
sdks/python/apache_beam/coders/coder_impl.py:
##########
@@ -441,6 +449,22 @@ def encode_to_stream(self, value, stream, nested):
     elif t is bool:
       stream.write_byte(BOOL_TYPE)
       stream.write_byte(value)
+    elif t is datetime.datetime:
+      # We use RFC 9557 for lossless encoding of timezone info.
+      stream.write_byte(DATETIME_TYPE)
+      stream.write(value.isoformat().encode("utf-8"))
+      if value.tzinfo is not None and type(
+          value.tzinfo) is not datetime.timezone:
+        stream.write(f"[{value.tzinfo}]".encode("utf-8"))
+      if type(
+          value.tzinfo) is datetime.timezone and (tzname :=
+                                                  value.tzname()) is not None:
+        stream.write(f"[tzn={tzname}]".encode("utf-8"))
+      if value.fold != 0:
+        stream.write(f"[f={value.fold}]".encode("utf-8"))

Review Comment:
   nested defaults to False and this matches the style of the other 
implementations which also omit the nested parameter when not needed.
   
   It's also likely more efficient to write to the stream multiple times which 
has a byte buffer vs constructing new immutable strings



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to