[
https://issues.apache.org/jira/browse/SQOOP-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630368#comment-14630368
]
Gwen Shapira commented on SQOOP-1853:
-------------------------------------
Few comments:
1. I think we need to think deeper about requirements / design here.
For example, I would think that integration with the "incremental" feature is
an important design goal - since Kafka was pretty much designed for only
supporting incremental fetches.
Perhaps a short doc for what this connector achieves and high level overview of
how will help this discussion.
2. The current patch does not partition the data at all. It is single threaded.
I think this is a problem since both Kafka and Sqoop are designed to scale, our
users will be disappointed with single-threaded connector.
3. The current patch uses auto-commit of offsets to Kafka. This can lead to
data loss. The high-level consumer API does support manual commits, but they
make assumptions about state that may not allow us to commit when we really
need to (Destroy phase, IMO), so you may need to switch to using the "simple"
API.
4. You can look at "Camus" as an example for how Kafka->HDFS integration can
work. They do partitioning and offset commits very well.
5. We need to design how to store offsets for incremental. We also need to
think of how users can modify them. I'd like to allow them to reset to
arbitrary offsets, and also to "earliest / latest".
6. We currently support single topics. Should be easy to support more if we
want.
7. Current patch seems to assume that the data in Kafka will be a CSV that
matches the target schema. I'd love to see support for Avro too, but that can
be a follow-up.
> Sqoop2: Kafka connector supporting FROM direction
> -------------------------------------------------
>
> Key: SQOOP-1853
> URL: https://issues.apache.org/jira/browse/SQOOP-1853
> Project: Sqoop
> Issue Type: Sub-task
> Affects Versions: 1.99.6
> Reporter: Gwen Shapira
> Assignee: Richard
> Fix For: 1.99.7
>
> Attachments: SQOOP-1853.0.patch, SQOOP-1853.1.patch,
> SQOOP-1853.2.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)