Hi,

I've been using the hudi delta streamer to create datasets in S3 and i've
had issues with hudi acknowledging schema compatibility.

I'm trying to run a spark job ingesting avro data to a hudi dataset in s3,
with the raw avro source data also stored in s3. The raw avro data has two
different schema versions, and I have supplied the job with the latest
schema. However the job fails to ingest any of the data that is not up to
date with the latest schema and ingests only the data matching the given
schema, despite the schema having full compatibility. Is this a known
issue? or just a case of missing some configuration?

The error I get when running the job to ingest the data not up to date with
latest avro schema is an array index out of bounds exception, and I know it
is a schema issue as I have tested running the job with the older schema
version, removing any data that matches the latest schema, and the job runs
successfully.

Reply via email to