Re: Datasource Writer Schema Evolution

leesf Wed, 05 Feb 2020 15:10:41 -0800

Hi Igor,

It is because the Spark ParquetFileFormat infer schema from the parquet
file under 20200205 dir, and the file do not contains the added
column(direction), you would just try `val hudiDF2 =
spark.read.format("org.apache.hudi").option("mergeSchema",
"true").load("/tmp/hudi/drivers/*")` to get schema merged from 20200205 and
20200206, and it shows the added column, I do not know whether it is a
common soulution but it solves the problem.


Best,
Leesf
`

Igor Basko <[email protected]> 于2020年2月5日周三 下午3:33写道：

> Hi All,
> I've tried to write data with some schema changes using the Datasource
> Writer.
> The procedure was:
> First I wrote an event with a specific schema.
> After that I wrote a different event with the same schema but with one more
> added field.
>
> When I read from the Hudi table, I get both the events, with the original
> schema.
> I was expecting to get both events with the newer schema with some default
> value in the new
> field for the first event.
>
> I've created a gist that describes my experience:
> https://gist.github.com/igorbasko01/4a1d0cf7c06a5b216382260efaa1f333
>
> Would like to know, if schema evolution is supported using the Datasource
> Writer.
> Or maybe I'm doing something wrong.
>
> Thanks a lot.
>

Re: Datasource Writer Schema Evolution

Reply via email to