Re: Datasource Writer Schema Evolution

Igor Basko Thu, 06 Feb 2020 04:35:19 -0800

Thanks a lot for the answer.
I was sure Hudi would store the latest schema, instead of merging it from
all the files.


On Thu, 6 Feb 2020 at 01:10, leesf <[email protected]> wrote:

> Hi Igor,
>
> It is because the Spark ParquetFileFormat infer schema from the parquet
> file under 20200205 dir, and the file do not contains the added
> column(direction), you would just try `val hudiDF2 =
> spark.read.format("org.apache.hudi").option("mergeSchema",
> "true").load("/tmp/hudi/drivers/*")` to get schema merged from 20200205 and
> 20200206, and it shows the added column, I do not know whether it is a
> common soulution but it solves the problem.
>
> Best,
> Leesf
> `
>
> Igor Basko <[email protected]> 于2020年2月5日周三 下午3:33写道：
>
> > Hi All,
> > I've tried to write data with some schema changes using the Datasource
> > Writer.
> > The procedure was:
> > First I wrote an event with a specific schema.
> > After that I wrote a different event with the same schema but with one
> more
> > added field.
> >
> > When I read from the Hudi table, I get both the events, with the original
> > schema.
> > I was expecting to get both events with the newer schema with some
> default
> > value in the new
> > field for the first event.
> >
> > I've created a gist that describes my experience:
> > https://gist.github.com/igorbasko01/4a1d0cf7c06a5b216382260efaa1f333
> >
> > Would like to know, if schema evolution is supported using the Datasource
> > Writer.
> > Or maybe I'm doing something wrong.
> >
> > Thanks a lot.
> >
>

Re: Datasource Writer Schema Evolution

Reply via email to