[ 
https://issues.apache.org/jira/browse/SQOOP-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630368#comment-14630368
 ] 

Gwen Shapira commented on SQOOP-1853:
-------------------------------------

Few comments:

1. I think we need to think deeper about requirements / design here. 
For example, I would think that integration with the "incremental" feature is 
an important design goal - since Kafka was pretty much designed for only 
supporting incremental fetches.
Perhaps a short doc for what this connector achieves and high level overview of 
how will help this discussion.

2. The current patch does not partition the data at all. It is single threaded. 
I think this is a problem since both Kafka and Sqoop are designed to scale, our 
users will be disappointed with single-threaded connector.

3. The current patch uses auto-commit of offsets to Kafka. This can lead to 
data loss. The high-level consumer API does support manual commits, but they 
make assumptions about state that may not allow us to commit when we really 
need to (Destroy phase, IMO), so you may need to switch to using the "simple" 
API.

4. You can look at "Camus" as an example for how Kafka->HDFS integration can 
work. They do partitioning and offset commits very well.

5. We need to design how to store offsets for incremental. We also need to 
think of how users can modify them. I'd like to allow them to reset to 
arbitrary offsets, and also to "earliest / latest". 

6. We currently support single topics. Should be easy to support more if we 
want.

7. Current patch seems to assume that the data in Kafka will be a CSV that 
matches the target schema. I'd love to see support for Avro too, but that can 
be a follow-up.

> Sqoop2: Kafka connector supporting FROM direction
> -------------------------------------------------
>
>                 Key: SQOOP-1853
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1853
>             Project: Sqoop
>          Issue Type: Sub-task
>    Affects Versions: 1.99.6
>            Reporter: Gwen Shapira
>            Assignee: Richard
>             Fix For: 1.99.7
>
>         Attachments: SQOOP-1853.0.patch, SQOOP-1853.1.patch, 
> SQOOP-1853.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to