[
https://issues.apache.org/jira/browse/CRUNCH-606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275344#comment-15275344
]
Josh Wills commented on CRUNCH-606:
-----------------------------------
So my first thought would be to delegate the deserialization to the PType
logic-- have the KafkaInputFormat always return instances of
BytesWritable/ByteBuffer for the keys/values, and leave it up to the PType that
was passed in to the KafkaSource to handle mapping those bytes into the
appropriate type, with some helper functions along the lines that we put into
the PTypes class. An AvroType is always going to expect an AvroWrapper for any
Avro-based input format, so it may be the case that a
WritableTypeFamily/BytesWritable as the base for the KafkaSource is the way to
go, even when the bytes themselves are serialized with Avro.
> Create a KafkaSource
> --------------------
>
> Key: CRUNCH-606
> URL: https://issues.apache.org/jira/browse/CRUNCH-606
> Project: Crunch
> Issue Type: New Feature
> Components: IO
> Reporter: Micah Whitacre
> Assignee: Micah Whitacre
> Attachments: CRUNCH-606.patch
>
>
> Pulling data out of Kafka is a common use case and some of the ways to do it
> Kafka Connect, Camus, Gobblin do not integrate nicely with existing
> processing pipelines like Crunch. With Kafka 0.9, the consuming API is a lot
> easier so we should build a Source implementation that can read from Kafka.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)