Hi, When reading through the datasource API like you are.. The schema merging etc behaves the same as spak.read.parquet().. Hudi merely filters the files on storage for the latest snapshot
https://hudi.apache.org/docs/querying_data.html#read-optimized-query-1 thanks Vinoth On Thu, Feb 6, 2020 at 8:11 AM leesf <[email protected]> wrote: > If you update the partition(20200205) after adding fields. it will show the > add fields by using ` val hudiDF2 = > spark.read.format("org.apache.hudi").load("/tmp/hudi/drivers/*"); > hudiDF2.show `, which needn't mergeSchema from all files. > > Igor Basko <[email protected]> 于2020年2月6日周四 下午8:35写道: > > > Thanks a lot for the answer. > > I was sure Hudi would store the latest schema, instead of merging it from > > all the files. > > > > On Thu, 6 Feb 2020 at 01:10, leesf <[email protected]> wrote: > > > > > Hi Igor, > > > > > > It is because the Spark ParquetFileFormat infer schema from the parquet > > > file under 20200205 dir, and the file do not contains the added > > > column(direction), you would just try `val hudiDF2 = > > > spark.read.format("org.apache.hudi").option("mergeSchema", > > > "true").load("/tmp/hudi/drivers/*")` to get schema merged from 20200205 > > and > > > 20200206, and it shows the added column, I do not know whether it is a > > > common soulution but it solves the problem. > > > > > > Best, > > > Leesf > > > ` > > > > > > Igor Basko <[email protected]> 于2020年2月5日周三 下午3:33写道: > > > > > > > Hi All, > > > > I've tried to write data with some schema changes using the > Datasource > > > > Writer. > > > > The procedure was: > > > > First I wrote an event with a specific schema. > > > > After that I wrote a different event with the same schema but with > one > > > more > > > > added field. > > > > > > > > When I read from the Hudi table, I get both the events, with the > > original > > > > schema. > > > > I was expecting to get both events with the newer schema with some > > > default > > > > value in the new > > > > field for the first event. > > > > > > > > I've created a gist that describes my experience: > > > > https://gist.github.com/igorbasko01/4a1d0cf7c06a5b216382260efaa1f333 > > > > > > > > Would like to know, if schema evolution is supported using the > > Datasource > > > > Writer. > > > > Or maybe I'm doing something wrong. > > > > > > > > Thanks a lot. > > > > > > > > > >
