Hi Katie, Thanks for explaining the problem in detail. Could you give us some more information before I can help you with this ?
1. What table type are you using - COPY_ON_WRITE or MERGE_ON_READ ? 2. Could you paste the exception you see in Hudi ? 3. "Despite the schema having full compatibility" -> Can you explain what you mean by "full compatibility" ? Thanks, Nishith On Tue, Jun 25, 2019 at 10:32 AM Katie Frost <[email protected]> wrote: > Hi, > > I've been using the hudi delta streamer to create datasets in S3 and i've > had issues with hudi acknowledging schema compatibility. > > I'm trying to run a spark job ingesting avro data to a hudi dataset in s3, > with the raw avro source data also stored in s3. The raw avro data has two > different schema versions, and I have supplied the job with the latest > schema. However the job fails to ingest any of the data that is not up to > date with the latest schema and ingests only the data matching the given > schema, despite the schema having full compatibility. Is this a known > issue? or just a case of missing some configuration? > > The error I get when running the job to ingest the data not up to date with > latest avro schema is an array index out of bounds exception, and I know it > is a schema issue as I have tested running the job with the older schema > version, removing any data that matches the latest schema, and the job runs > successfully. >
