The PR to fix this is https://github.com/apache/incubator-iceberg/pull/108
I need to look into a couple of task failures, but could you validate that it works as you expect? Thanks! On Wed, Feb 20, 2019 at 1:04 PM suds <sudssf2...@gmail.com> wrote: > Thank you for looking into this issue. I was planning to debug issue this > week but looks like you already figured it out :) > I will follow issue on github to know more about fix. > > -- > Thanks > > On Wed, Feb 20, 2019 at 11:13 AM Ryan Blue <rb...@netflix.com> wrote: > >> Sudsport, >> >> Good catch here, and thank you for the gist that reproduces the issue. >> >> The problem happens when pushing predicates down to manifest files. >> Manifests keep track of the schema and partition spec that was used to >> write the manifest. The reader currently uses that schema when converting >> and binding predicates to evaluate on the partition data in the manifest. >> So this is a bug where we haven't passed the current table schema down to >> the manifest reader. >> >> I'll open an issue for it and fix this. Thanks! >> >> rb >> >> On Fri, Feb 15, 2019 at 11:34 AM suds <sudssf2...@gmail.com> wrote: >> >>> Thanks for reply Ryan. >>> >>> I created gist with code example >>> >>> https://gist.github.com/sudssf/e5f2de7463487f98c0a269221bbe0f1a >>> >>> Please let me know if I am not using API correctly. >>> >>> >>> On Thu, Feb 14, 2019 at 5:38 PM Ryan Blue <rb...@netflix.com> wrote: >>> >>>> Sudsport, >>>> >>>> I'm wondering if you had the table cached somewhere? Those renames >>>> should work. My guess is that the query used a table version that was out >>>> of date. >>>> >>>> Can you put together a minimal script that reproduces the error and >>>> open an issue? That way I can fix it. >>>> >>>> rb >>>> >>>> On Thu, Feb 14, 2019 at 3:01 PM sudsport s <sudssf2...@gmail.com> >>>> wrote: >>>> >>>>> Adding dev@iceberg.apache.org >>>>> >>>>> >>>>> On Thu, Feb 14, 2019 at 3:00 PM sudsport s <sudssf2...@gmail.com> >>>>> wrote: >>>>> >>>>>> HI I am doing some testing with schema evolution. I looked at >>>>>> testSchemaUpdate method and SchemaUpdate class for reference. >>>>>> >>>>>> >>>>>> Here are steps I doing to test schema evolution validation >>>>>> >>>>>> initially data is created with following schema using "key" as >>>>>> partition key >>>>>> >>>>>> root >>>>>> |-- id: string (nullable = true) >>>>>> |-- value: string (nullable = true) >>>>>> |-- key: integer (nullable = false) >>>>>> |-- value1: string (nullable = true) >>>>>> |-- value2: string (nullable = true) >>>>>> >>>>>> schema update to rename value1 -> v1 >>>>>> >>>>>> root >>>>>> |-- id: string (nullable = true) >>>>>> |-- value: string (nullable = true) >>>>>> |-- key: integer (nullable = false) >>>>>> |-- v1: string (nullable = true) >>>>>> |-- value2: string (nullable = true) >>>>>> >>>>>> schema update to rename key -> newKey ( I know changing partition key >>>>>> is not good idea but this is a test :) ) >>>>>> >>>>>> root >>>>>> |-- id: string (nullable = true) >>>>>> |-- value: string (nullable = true) >>>>>> |-- newKey: integer (nullable = false) >>>>>> |-- v1: string (nullable = true) >>>>>> |-- value2: string (nullable = true) >>>>>> >>>>>> >>>>>> when I read data frame using spark I get following schema >>>>>> >>>>>> root >>>>>> |-- id: string (nullable = true) >>>>>> |-- value: string (nullable = true) >>>>>> |-- newKey: integer (nullable = false) >>>>>> |-- v1: string (nullable = true) >>>>>> |-- value2: string (nullable = true) >>>>>> >>>>>> but when I try to run query or scan using changed column in where >>>>>> clause I get following exception >>>>>> >>>>>> >>>>>> INFO TableScan: Scanning table /tmp/schema-evolution snapshot >>>>>> 1550184572006 created at 2019-02-14 14:49:32.189 with filter >>>>>> not_null(ref(name="v1")) >>>>>> Exception in thread "main" >>>>>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, >>>>>> tree: >>>>>> Exchange SinglePartition >>>>>> +- *(1) HashAggregate(keys=[], functions=[partial_count(1)], >>>>>> output=[count#77L]) >>>>>> +- *(1) Project >>>>>> +- *(1) Filter (isnotnull(v1#60) && (cast(v1#60 as int) = 0)) >>>>>> +- *(1) DataSourceV2Scan [v1#60], >>>>>> IcebergScan(table=/tmp/schema-evolution, type=struct<4: v1: optional >>>>>> string>, filters=[not_null(ref(name="v1"))]) >>>>>> >>>>>> at >>>>>> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) >>>>>> at >>>>>> org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119) >>>>>> at >>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) >>>>>> >>>>>> Caused by: com.netflix.iceberg.exceptions.ValidationException: Cannot >>>>>> find field 'v1' in struct: struct<1: id: optional string, 2: value: >>>>>> optional string, 3: key: required int, 4: value1: optional string, 5: >>>>>> value2: optional string> >>>>>> at >>>>>> com.netflix.iceberg.exceptions.ValidationException.check(ValidationException.java:39) >>>>>> at >>>>>> com.netflix.iceberg.expressions.UnboundPredicate.bind(UnboundPredicate.java:46) >>>>>> >>>>>> >>>>>> I ran same query using where various combinations "v1 = 0" , "value1 >>>>>> = 0" , "key = 0" and "newKey = 0" >>>>>> >>>>>> What is best way to query data in iceberg table when schema is >>>>>> changed? >>>>>> >>>>>> >>>>>> following output from metadata json >>>>>> >>>>>> >>>>>> < "name" : "key", >>>>>> --- >>>>>> > "name" : "newKey", >>>>>> 25c25 >>>>>> < "name" : "value1", >>>>>> --- >>>>>> > "name" : "v1", >>>>>> >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "Iceberg Developers" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to iceberg-devel+unsubscr...@googlegroups.com. >>>>>> To post to this group, send email to iceberg-de...@googlegroups.com. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/iceberg-devel/3efe985e-2302-412b-a899-8efe1fbf13c8%40googlegroups.com >>>>>> <https://groups.google.com/d/msgid/iceberg-devel/3efe985e-2302-412b-a899-8efe1fbf13c8%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Iceberg Developers" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to iceberg-devel+unsubscr...@googlegroups.com. >>>>> To post to this group, send email to iceberg-de...@googlegroups.com. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/iceberg-devel/CAO32DPxrri4Oz%2BuX6vwgdh3NhW5FgxEmTumRrba5N6M6Rkuy5Q%40mail.gmail.com >>>>> <https://groups.google.com/d/msgid/iceberg-devel/CAO32DPxrri4Oz%2BuX6vwgdh3NhW5FgxEmTumRrba5N6M6Rkuy5Q%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> >>>> -- >>>> Ryan Blue >>>> Software Engineer >>>> Netflix >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Iceberg Developers" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to iceberg-devel+unsubscr...@googlegroups.com. >>> To post to this group, send email to iceberg-de...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/iceberg-devel/CAO32DPy3pnDY8qohaVjRFyLsEnT-bdkcHYX0X9dgW5dKpuoW8w%40mail.gmail.com >>> <https://groups.google.com/d/msgid/iceberg-devel/CAO32DPy3pnDY8qohaVjRFyLsEnT-bdkcHYX0X9dgW5dKpuoW8w%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> > -- Ryan Blue Software Engineer Netflix