Have you considered using the ReadAllFromAvro transform: https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/extensions/avro/io/AvroIO.ReadAll.html
On Sun, Oct 27, 2024 at 10:41 AM Francesco Scipioni <scipion...@gmail.com> wrote: > > Hi everyone, > > I am currently working on implementing a custom I/O connector using Apache > Beam, specifically aiming to run it on Google Cloud Dataflow. Here’s a quick > overview of my workflow: > > I receive an event via Pub/Sub, which includes a pointer to an Avro file > stored in a GCS bucket. My goal is to wrap Dataflow’s functionality to make > this a streaming process while also leveraging autoscaling capabilities on > GCS metadata and not on PubSub metadata. > > My current approach is to create an *Unbounded Pub/Sub Source* that wraps a > *Bounded GCS/Avro Source*. I intend to use checkpoints by leveraging metadata > from both the Avro file and Pub/Sub. > > However, I'm encountering challenges with implementing the checkpointing > mechanism, and I'm unsure about the optimal approach to get started with the > project setup. > > Would anyone be able to provide guidance, suggestions, or examples? I’d > really appreciate any help you can offer. > > Thank you very much! > > Best regards, > Francesco