Hi,

Thanks for taking care of this issue in the Python SDK, Thomas!

It would be nice to have a uniform precision for timestamps but, as Kenn pointed out, timestamps are extracted from systems that have different precision.

To add to the list: Flink - milliseconds

After all, it doesn't matter as long as there is sufficient precision and conversions are done correctly.

I think we could improve the situation by at least adding a "milliseconds" constructor to the Python SDK's Timestamp.

Cheers,
Max

On 17.04.19 04:13, Kenneth Knowles wrote:
I am not so sure this is a good idea. Here are some systems and their precision:

Arrow - microseconds
BigQuery - microseconds
New Java instant - nanoseconds
Firestore - microseconds
Protobuf - nanoseconds
Dataflow backend - microseconds
Postgresql - microseconds
Pubsub publish time - nanoseconds
MSSQL datetime2 - 100 nanoseconds (original datetime about 3 millis)
Cassandra - milliseconds

IMO it is important to be able to treat any of these as a Beam timestamp, even though they aren't all streaming. Who knows when we might be ingesting a streamed changelog, or using them for reprocessing an archived stream. I think for this purpose we either should standardize on nanoseconds or make the runner's resolution independent of the data representation.

I've had some offline conversations about this. I think we can have higher-than-runner precision in the user data, and allow WindowFns and DoFns to operate on this higher-than-runner precision data, and still have consistent watermark treatment. Watermarks are just bounds, after all.

Kenn

On Tue, Apr 16, 2019 at 6:48 PM Thomas Weise <t...@apache.org <mailto:t...@apache.org>> wrote:

    The Python SDK currently uses timestamps in microsecond resolution
    while Java SDK, as most would probably expect, uses milliseconds.

    This causes a few difficulties with portability (Python coders need
    to convert to millis for WindowedValue and Timers, which is related
    to a bug I'm looking into:

    https://issues.apache.org/jira/browse/BEAM-7035

    As Luke pointed out, the issue was previously discussed:

    https://issues.apache.org/jira/browse/BEAM-1524

    I'm not privy to the reasons why we decided to go with micros in
    first place, but would it be too big of a change or impractical for
    other reasons to switch Python SDK to millis before it gets more users?

    Thanks,
    Thomas

Reply via email to