Re: Python SDK timestamp precision

Maximilian Michels Wed, 17 Apr 2019 05:43:39 -0700

Hi,

Thanks for taking care of this issue in the Python SDK, Thomas!

It would be nice to have a uniform precision for timestamps but, as Kennpointed out, timestamps are extracted from systems that have differentprecision.


To add to the list: Flink - milliseconds

After all, it doesn't matter as long as there is sufficient precisionand conversions are done correctly.

I think we could improve the situation by at least adding a"milliseconds" constructor to the Python SDK's Timestamp.


Cheers,
Max

On 17.04.19 04:13, Kenneth Knowles wrote:

I am not so sure this is a good idea. Here are some systems and theirprecision:
Arrow - microseconds
BigQuery - microseconds
New Java instant - nanoseconds
Firestore - microseconds
Protobuf - nanoseconds
Dataflow backend - microseconds
Postgresql - microseconds
Pubsub publish time - nanoseconds
MSSQL datetime2 - 100 nanoseconds (original datetime about 3 millis)
Cassandra - milliseconds
IMO it is important to be able to treat any of these as a Beamtimestamp, even though they aren't all streaming. Who knows when wemight be ingesting a streamed changelog, or using them for reprocessingan archived stream. I think for this purpose we either shouldstandardize on nanoseconds or make the runner's resolution independentof the data representation.
I've had some offline conversations about this. I think we can havehigher-than-runner precision in the user data, and allow WindowFns andDoFns to operate on this higher-than-runner precision data, and stillhave consistent watermark treatment. Watermarks are just bounds, after all.
Kenn
On Tue, Apr 16, 2019 at 6:48 PM Thomas Weise <t...@apache.org<mailto:t...@apache.org>> wrote:
    The Python SDK currently uses timestamps in microsecond resolution
    while Java SDK, as most would probably expect, uses milliseconds.

    This causes a few difficulties with portability (Python coders need
    to convert to millis for WindowedValue and Timers, which is related
    to a bug I'm looking into:

    https://issues.apache.org/jira/browse/BEAM-7035

    As Luke pointed out, the issue was previously discussed:

    https://issues.apache.org/jira/browse/BEAM-1524

    I'm not privy to the reasons why we decided to go with micros in
    first place, but would it be too big of a change or impractical for
    other reasons to switch Python SDK to millis before it gets more users?

    Thanks,
    Thomas

Re: Python SDK timestamp precision

Reply via email to