Ingestion Layer?

Austin Bennett Thu, 09 Nov 2023 09:53:41 -0800

Just a little comment that I imagine there would be massive value from an
ingestion layer.


Making it easier to add more integrations will be a great benefit for the
ecosystem, adoption.


Concretely, FWIW, I'm evaluating Iceberg [ and alternatives ] for an
enterprise adoption, and existing integrations [ both for reads from and
ingesting into iceberg ] and ease-of-contributing lacking integrations are
TOP of mind.



On Mon, Oct 2, 2023 at 11:03 PM Jean-Baptiste Onofré <[email protected]>
wrote:

> From my standpoint, Kafka Connect is interesting to also address
> processing logic without Spark or Flink runtime. Definitely
> interesting to have Kafka integration/processing (even for me Kafka
> and Kafka Connect are two different things ;)).
>
> For pure data ingestion part, I think it would make sense to have a
> "ingestion layer" in Iceberg where we can have pluggable IO and where
> we can both implement our own IO (specifically for Iceberg as Apache
> Beam IOs for instance) and where we can leverage existing integration
> framework (like Apache Camel).
> Why not have JMS/ActiveMQ integration in Iceberg via an IO, or Pulsar
> integration ? I think having such layer would be very interesting for
> the community and we can have more users (it's what happened at Apache
> Beam, the first IOs were only Google "centric" (bigtable, bigquery,
> gfs, ...), we added new IOs (JMS, Kafka, JDBC, ...) and we saw a great
> benefit for adoption :)).
> DISCLAIMER: I've implemented IOs in Beam and components in Camel ;)
>
> I will do some investigation about that. I will draft a proposal.
>
> Regards
> JB
>
> On Tue, Oct 3, 2023 at 7:23 AM Ajantha Bhat <[email protected]> wrote:
> >
> > Hi Bryan,
> >
> > I am very happy to see this contribution.
> > I have recently tested this project with Nessie catalog and very much
> liked it.
> >
> > However, I still don't know the benefits of using kafka-connect instead
> of directly consuming
> > from the kafka like Delta-lake's implementation.
> > https://github.com/delta-io/kafka-delta-ingest/blob/main/doc/DESIGN.md
> >
> > I am not an expert in this ingestion domain and recently got started.
> > I hope someone will chime in and we will have detailed analysis over the
> design.
> >
> > Looking forward to this feature.
> >
> > Thanks,
> > Ajantha
> >
> > On Tue, Oct 3, 2023 at 12:18 AM Jean-Baptiste Onofré <[email protected]>
> wrote:
> >>
> >> Hi Bryan
> >>
> >> That’s a great news ! Thanks a lot for the proposal.
> >>
> >> I will take a look on the PR and existing connector.
> >> I’m sure the Iceberg community will be very happy to see this and we
> will able to add new features and improvements thanks to the community
> feedback.
> >> I would be more than happy to help for donation (I know that the
> connector is already under Apache license but we have to double check the
> ICLA for the initial contributors etc , just to be sure we are good there).
> >>
> >> Thanks again !
> >>
> >> Let’s see what the others are thinking.
> >>
> >> Regards
> >> JB
> >>
> >> Le lun. 2 oct. 2023 à 19:39, Bryan Keller <[email protected]> a écrit :
> >>>
> >>> Hi all,
> >>>
> >>> We at Tabular would like to contribute our Kafka Connect Iceberg sink
> to the Iceberg project. It would be great to give Iceberg users another
> option for landing data from Kafka into Iceberg tables that is supported by
> the Iceberg community. Kafka Connect is a part of systems from AWS,
> Confluent, Redpanda, and so on, so it can make landing data from Kafka into
> Iceberg much easier for those without a Flink or Spark infrastructure.
> >>>
> >>> There are a few Iceberg sink implementations out there for Kafka
> Connect, but we feel this one covers most of the features users have
> requested, such as exactly-once processing, schema evolution, and
> multi-table fanout. And having the sink backed by the Iceberg community
> will help it to evolve and improve over time.
> >>>
> >>> If this sounds like something everyone would like to see added to
> Iceberg, I've opened a PR that includes some initial pieces of the sink.
> The thought was to break up the submission into parts so each could be
> reviewed more easily. Some design docs and notes can be found in the
> original repo here:
> https://github.com/tabular-io/iceberg-kafka-connect/tree/main/docs
> >>>
> >>> We'd like to get feedback if others approve of moving forward with
> this or not.
> >>>
> >>> Thanks,
> >>> Bryan
> >>>
>

Ingestion Layer?

Reply via email to