+1 to offering more granular timestamps in general. I think it will be
odd if setting the element timestamp from a row DATETIME field is
lossy, so we should seriously consider upgrading that as well.
On Tue, Nov 6, 2018 at 6:42 AM Charles Chen <c...@google.com> wrote:
>
> One related issue that came up before is that we (perhaps unnecessarily) 
> restrict the precision of timestamps in the Python SDK to milliseconds 
> because of legacy reasons related to the Java runner's use of Joda time.  
> Perhaps Beam portability should natively use a more granular timestamp unit.
>
> On Mon, Nov 5, 2018 at 9:34 PM Rui Wang <ruw...@google.com> wrote:
>>
>> Thanks Reuven!
>>
>> I think Reuven gives the third option:
>>
>> Change internal representation of DATETIME field in Row. Still keep public 
>> ReadableDateTime getDateTime(String fieldName) API to be compatible with 
>> existing code. And I think we could add one more API to 
>> getDataTimeNanosecond. This option is different from the option one because 
>> option one actually maintains two implementation of time.
>>
>> -Rui
>>
>> On Mon, Nov 5, 2018 at 9:26 PM Reuven Lax <re...@google.com> wrote:
>>>
>>> I would vote that we change the internal representation of Row to something 
>>> other than Joda. Java 8 times would give us at least microseconds, and if 
>>> we want nanoseconds we could simply store it as a number.
>>>
>>> We should still keep accessor methods that return and take Joda objects, as 
>>> the rest of Beam still depends on Joda.
>>>
>>> Reuven
>>>
>>> On Mon, Nov 5, 2018 at 9:21 PM Rui Wang <ruw...@google.com> wrote:
>>>>
>>>> Hi Community,
>>>>
>>>> The DATETIME field in Beam Schema/Row is implemented by Joda's Datetime 
>>>> (see Row.java#L611 and Row.java#L169). Joda's Datetime is limited to the 
>>>> precision of millisecond. It has good enough precision to represent 
>>>> timestamp of event time, but it is not enough for the real "time" data. 
>>>> For the "time" type data, we probably need to support even up to the 
>>>> precision of nanosecond.
>>>>
>>>> Unfortunately, Joda decided to keep the precision of millisecond: 
>>>> https://github.com/JodaOrg/joda-time/issues/139.
>>>>
>>>> If we want to support the precision of nanosecond, we could have two 
>>>> options:
>>>>
>>>> Option one: utilize current FieldType's metadata field, such that we could 
>>>> set something into meta data and Row could check the metadata to decide 
>>>> what's saved in DATETIME field: Joda's Datetime or an implementation that 
>>>> supports nanosecond.
>>>>
>>>> Option two: have another field (maybe called TIMESTAMP field?), to have an 
>>>> implementation to support higher precision of time.
>>>>
>>>> What do you think about the need of higher precision for time type and 
>>>> which option is preferred?
>>>>
>>>> -Rui

Reply via email to