[ 
https://issues.apache.org/jira/browse/CRUNCH-606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275344#comment-15275344
 ] 

Josh Wills commented on CRUNCH-606:
-----------------------------------

So my first thought would be to delegate the deserialization to the PType 
logic-- have the KafkaInputFormat always return instances of 
BytesWritable/ByteBuffer for the keys/values, and leave it up to the PType that 
was passed in to the KafkaSource to handle mapping those bytes into the 
appropriate type, with some helper functions along the lines that we put into 
the PTypes class. An AvroType is always going to expect an AvroWrapper for any 
Avro-based input format, so it may be the case that a 
WritableTypeFamily/BytesWritable as the base for the KafkaSource is the way to 
go, even when the bytes themselves are serialized with Avro.

> Create a KafkaSource
> --------------------
>
>                 Key: CRUNCH-606
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-606
>             Project: Crunch
>          Issue Type: New Feature
>          Components: IO
>            Reporter: Micah Whitacre
>            Assignee: Micah Whitacre
>         Attachments: CRUNCH-606.patch
>
>
> Pulling data out of Kafka is a common use case and some of the ways to do it 
> Kafka Connect, Camus, Gobblin do not integrate nicely with existing 
> processing pipelines like Crunch.  With Kafka 0.9, the consuming API is a lot 
> easier so we should build a Source implementation that can read from Kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to