[
https://issues.apache.org/jira/browse/CRUNCH-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Micah Whitacre updated CRUNCH-606:
----------------------------------
Attachment: CRUNCH-606.diff
So the way I have the source currently written it will support reading from 1:M
topics as long as they produce the same payloads. This fits our use case where
our topics are segregated by source but payloads are all the same. In theory
the flexibility in the Serializer/Deserializer is by topic to support things
like migration of the serialized format (e.g. person_json, person_avro could
both produce Person objects but their byte form might be different)
To be honest we don't really have this use case but it has been indicated that
this is how things like the SchemaRegistry from Confluent does passivity.
Attached is the patch of my code so far. I think the only missing piece is the
converter and need for more integration testing.
You'll notice that right now I have the KafkaSource taking in a PTableType.
While I'd love for the source to be PTypeFamily agnostic it seems like if I
could change the InputFormat/RecordReader to only support AvroTypeFamily or
write a ConverterShim. I'll play around with this some more.
> Create a KafkaSource
> --------------------
>
> Key: CRUNCH-606
> URL: https://issues.apache.org/jira/browse/CRUNCH-606
> Project: Crunch
> Issue Type: New Feature
> Components: IO
> Reporter: Micah Whitacre
> Assignee: Micah Whitacre
> Attachments: CRUNCH-606.diff, CRUNCH-606.patch
>
>
> Pulling data out of Kafka is a common use case and some of the ways to do it
> Kafka Connect, Camus, Gobblin do not integrate nicely with existing
> processing pipelines like Crunch. With Kafka 0.9, the consuming API is a lot
> easier so we should build a Source implementation that can read from Kafka.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)