I agree - this is somewhat more of a general question as @Philipp already 
pointed out. 

I share @Dominik's suggestions and think that we def need  the feature to 
actively ask users/display a warning in case of a missing timestamp field.

Coming back to the more general aspect, it is in the nature of an event, that 
it occurs or is created at a certain point in „time“. Knowing this point in 
time is crucial 
to deduce contextual knowledge about situations etc, a measurement at time T1 
might „mean“ something totally different that a measurement at time T2.
Especially when thinking about windowing, streaming joins etc.

Thus, we I would suggest the following:

1. always actively inform the user that he/she a timestamp field is required
2a. if provided in the raw event stream: mark it (event time) - maybe needs 
transformation using date format strings etc.
2b. if not provided in the raw event stream: add it (ingestion time, when the 
event is processed by the Connect worker instance)

Internally, we then leverage timestamps as UNIX timestamps.

As for the suggestion with the index. We could do that, however it doesn’t feel 
intuitive to me and since we only use the index as an indicator for the passed 
seconds
since the adapter is created you could also just use the beforementioned method 
(2b) to simply add a timestamp on adapter creation using the wall clock time
of the Connect worker instance.

Doesn’t this cover all cases or am I missing some?

Patrick


> Am 06.08.2020 um 23:14 schrieb Philipp Zehnder <zehn...@apache.org>:
> 
> Hi Marco,
> 
> do you mean this as a solution for all adapters, or for the file stream 
> adapter?
> 
> If you mean it for the file stream adapter, then I would suggest that we 
> mention in the documentation that a user should add an index column.
> Then mark this as a timestamp and provide this regex “s” (Then each number is 
> interpreted as seconds).
> I like the idea of using the line index, but I do not know how we could 
> implement this generic for all different formats. Do you have an idea?
> 
> Philipp
> 
>> On 6. Aug 2020, at 16:03, Marco Heyden <heydenmarc...@gmail.com> wrote:
>> 
>> Hey, 
>> 
>> Maybe another option would be to use the data index as a default timestamp, 
>> if no other timestamp is provided. Then one could specify a sampling 
>> frequency and obtain the relative time since the start of recording.
>> 
>> What do you think?
>> 
>> Best
>> Marco
>> 
>> Am 06.08.20, 15:57 schrieb "Philipp Zehnder" <zehn...@apache.org>:
>> 
>>   Hi,
>> 
>>   this is a general question. Do we want a time stamp each event?
>>   I think it makes sense to have a timestamp in each event, because then we 
>> always know when they occurred. When there is no timestamp in the data it 
>> can be added in the adapter. What is your opinion on that?
>> 
>>   With connect we have one case where a timestamp is required.
>>   For the file stream adapter, we use the timestamp to replay the events 
>> according to the offset between the timestamps in the events in the file.
>>   This enables us to simulate the original data stream.
>>   Therefore, we need a timestamp in the event schema. The event schema 
>> component is independent of the adapter used, so we do not know whether the 
>> timestamp is required or not. 
>> 
>>   Philipp
>> 
>> 
>>> On 6. Aug 2020, at 10:31, Dominik Riemer <rie...@apache.org> wrote:
>>> 
>>> Hi,
>>> 
>>> is there an advantage of requiring a timestamp in every event? Maybe we 
>>> could also only display a warning or actively ask users in Connect in case 
>>> a timestamp is missing and force the addition of timestamps in one of the 
>>> following releases.
>>> 
>>> Dominik
>>> 
>>> On 2020/08/04 17:52:07, Patrick Wiener <wie...@apache.org> wrote: 
>>>> Hi Philipp,
>>>> 
>>>> I think that is definitely a valuable feature to check for timestamp 
>>>> existence before creating the adapter since we have a various processors 
>>>> or sinks that rely on a timestamp. 
>>>> 
>>>> One possible solution could be to notify users immediately in case a 
>>>> timestamp field is missing, e.g. in a dialog. 
>>>> 
>>>> 
>>>> Patrick
>>>> 
>>>>> Am 04.08.2020 um 19:40 schrieb Philipp Zehnder <zehn...@apache.org>:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I am currently reworking the schema editor in Connect to work with the 
>>>>> newly generated model. 
>>>>> The following question came up: Should we ensure that there is a 
>>>>> timestamp in the event? 
>>>>> I.e. users have to add a timestamp or mark a property as a timestamp. 
>>>>> 
>>>>> What do you think?
>>>>> 
>>>>> Philipp
>>>> 
>>>> 
>> 
>> 
>> 
> 

Reply via email to