Hi,
Thanks for taking care of this issue in the Python SDK, Thomas!
It would be nice to have a uniform precision for timestamps but, as Kenn
pointed out, timestamps are extracted from systems that have different
precision.
To add to the list: Flink - milliseconds
After all, it doesn't matter as long as there is sufficient precision
and conversions are done correctly.
I think we could improve the situation by at least adding a
"milliseconds" constructor to the Python SDK's Timestamp.
Cheers,
Max
On 17.04.19 04:13, Kenneth Knowles wrote:
I am not so sure this is a good idea. Here are some systems and their
precision:
Arrow - microseconds
BigQuery - microseconds
New Java instant - nanoseconds
Firestore - microseconds
Protobuf - nanoseconds
Dataflow backend - microseconds
Postgresql - microseconds
Pubsub publish time - nanoseconds
MSSQL datetime2 - 100 nanoseconds (original datetime about 3 millis)
Cassandra - milliseconds
IMO it is important to be able to treat any of these as a Beam
timestamp, even though they aren't all streaming. Who knows when we
might be ingesting a streamed changelog, or using them for reprocessing
an archived stream. I think for this purpose we either should
standardize on nanoseconds or make the runner's resolution independent
of the data representation.
I've had some offline conversations about this. I think we can have
higher-than-runner precision in the user data, and allow WindowFns and
DoFns to operate on this higher-than-runner precision data, and still
have consistent watermark treatment. Watermarks are just bounds, after all.
Kenn
On Tue, Apr 16, 2019 at 6:48 PM Thomas Weise <t...@apache.org
<mailto:t...@apache.org>> wrote:
The Python SDK currently uses timestamps in microsecond resolution
while Java SDK, as most would probably expect, uses milliseconds.
This causes a few difficulties with portability (Python coders need
to convert to millis for WindowedValue and Timers, which is related
to a bug I'm looking into:
https://issues.apache.org/jira/browse/BEAM-7035
As Luke pointed out, the issue was previously discussed:
https://issues.apache.org/jira/browse/BEAM-1524
I'm not privy to the reasons why we decided to go with micros in
first place, but would it be too big of a change or impractical for
other reasons to switch Python SDK to millis before it gets more users?
Thanks,
Thomas