[ 
https://issues.apache.org/jira/browse/SPARK-50160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Schulz updated SPARK-50160:
---------------------------------
    Description: 
Currently, there is no way to customise the timestamp of a {{ProducerRecord}} 
produced by the {{{}KafkaWriteTask{}}}. Here at Wikimedia we often use 
event-time semantics, so it would be helpful if Kafka records produced via 
spark could resemble that.

The producer already allows that as stated in the 
[{{Producer.send}}|#send-org.apache.kafka.clients.producer.ProducerRecord-org.apache.kafka.clients.producer.Callback-]]
 API docs:
{quote}If CreateTime is used by the topic, the timestamp will be the user 
provided timestamp or the record send time if the user did not specify a 
timestamp for the record.
{quote}
So the proposed feature enables users of the spark kafka output to specify the 
create-time.

I already checked out the code and was able to adapted it to fulfil our need. 
Since I couldn't find any (closed) ticket concerned with this topic, I assume 
no such feature has been denied until now.

  was:
Currently, there is no way to customise the timestamp of a {{ProducerRecord}} 
produced by the {{{}KafkaWriteTask{}}}. Here at Wikimedia we often use 
event-time semantics, so it would be helpful if kafka records produced via 
spark could resemble that.

I already checked out the code and was able to adapted it to fulfil our need. 
Since I couldn't find any (closed) ticket concerned with this topic, I assume 
no such feature has been denied until now.


> KafkaWriteTask: support timestamp customisation 
> ------------------------------------------------
>
>                 Key: SPARK-50160
>                 URL: https://issues.apache.org/jira/browse/SPARK-50160
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL, Structured Streaming
>    Affects Versions: 3.5.3
>            Reporter: Peter Schulz
>            Priority: Major
>              Labels: pull-request-available
>
> Currently, there is no way to customise the timestamp of a {{ProducerRecord}} 
> produced by the {{{}KafkaWriteTask{}}}. Here at Wikimedia we often use 
> event-time semantics, so it would be helpful if Kafka records produced via 
> spark could resemble that.
> The producer already allows that as stated in the 
> [{{Producer.send}}|#send-org.apache.kafka.clients.producer.ProducerRecord-org.apache.kafka.clients.producer.Callback-]]
>  API docs:
> {quote}If CreateTime is used by the topic, the timestamp will be the user 
> provided timestamp or the record send time if the user did not specify a 
> timestamp for the record.
> {quote}
> So the proposed feature enables users of the spark kafka output to specify 
> the create-time.
> I already checked out the code and was able to adapted it to fulfil our need. 
> Since I couldn't find any (closed) ticket concerned with this topic, I assume 
> no such feature has been denied until now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to