[
https://issues.apache.org/jira/browse/SPARK-50160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peter Schulz updated SPARK-50160:
---------------------------------
Description:
Currently, there is no way to customise the timestamp of a {{ProducerRecord}}
produced by the {{{}KafkaWriteTask{}}}. Here at Wikimedia we often use
event-time semantics, so it would be helpful if Kafka records produced via
spark could resemble that.
The producer already allows that as stated in the
[{{Producer.send}}|#send-org.apache.kafka.clients.producer.ProducerRecord-org.apache.kafka.clients.producer.Callback-]]
API docs:
{quote}If CreateTime is used by the topic, the timestamp will be the user
provided timestamp or the record send time if the user did not specify a
timestamp for the record.
{quote}
So the proposed feature enables users of the spark kafka output to specify the
create-time.
I already checked out the code and was able to adapted it to fulfil our need.
Since I couldn't find any (closed) ticket concerned with this topic, I assume
no such feature has been denied until now.
was:
Currently, there is no way to customise the timestamp of a {{ProducerRecord}}
produced by the {{{}KafkaWriteTask{}}}. Here at Wikimedia we often use
event-time semantics, so it would be helpful if kafka records produced via
spark could resemble that.
I already checked out the code and was able to adapted it to fulfil our need.
Since I couldn't find any (closed) ticket concerned with this topic, I assume
no such feature has been denied until now.
> KafkaWriteTask: support timestamp customisation
> ------------------------------------------------
>
> Key: SPARK-50160
> URL: https://issues.apache.org/jira/browse/SPARK-50160
> Project: Spark
> Issue Type: Improvement
> Components: SQL, Structured Streaming
> Affects Versions: 3.5.3
> Reporter: Peter Schulz
> Priority: Major
> Labels: pull-request-available
>
> Currently, there is no way to customise the timestamp of a {{ProducerRecord}}
> produced by the {{{}KafkaWriteTask{}}}. Here at Wikimedia we often use
> event-time semantics, so it would be helpful if Kafka records produced via
> spark could resemble that.
> The producer already allows that as stated in the
> [{{Producer.send}}|#send-org.apache.kafka.clients.producer.ProducerRecord-org.apache.kafka.clients.producer.Callback-]]
> API docs:
> {quote}If CreateTime is used by the topic, the timestamp will be the user
> provided timestamp or the record send time if the user did not specify a
> timestamp for the record.
> {quote}
> So the proposed feature enables users of the spark kafka output to specify
> the create-time.
> I already checked out the code and was able to adapted it to fulfil our need.
> Since I couldn't find any (closed) ticket concerned with this topic, I assume
> no such feature has been denied until now.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]