Re: Question about schema evolution in iceberg table

suds Fri, 15 Feb 2019 11:44:49 -0800

Thanks for reply Ryan.

I created gist with code example


https://gist.github.com/sudssf/e5f2de7463487f98c0a269221bbe0f1a

Please let me know if I am not using API correctly.


On Thu, Feb 14, 2019 at 5:38 PM Ryan Blue <rb...@netflix.com> wrote:

> Sudsport,
>
> I'm wondering if you had the table cached somewhere? Those renames should
> work. My guess is that the query used a table version that was out of date.
>
> Can you put together a minimal script that reproduces the error and open
> an issue? That way I can fix it.
>
> rb
>
> On Thu, Feb 14, 2019 at 3:01 PM sudsport s <sudssf2...@gmail.com> wrote:
>
>> Adding dev@iceberg.apache.org
>>
>>
>> On Thu, Feb 14, 2019 at 3:00 PM sudsport s <sudssf2...@gmail.com> wrote:
>>
>>> HI I am doing some testing with schema evolution.  I looked at
>>> testSchemaUpdate method and SchemaUpdate class for reference.
>>>
>>>
>>> Here are steps I doing to test schema evolution validation
>>>
>>> initially data is created with following schema using  "key" as
>>> partition key
>>>
>>> root
>>>  |-- id: string (nullable = true)
>>>  |-- value: string (nullable = true)
>>>  |-- key: integer (nullable = false)
>>>  |-- value1: string (nullable = true)
>>>  |-- value2: string (nullable = true)
>>>
>>> schema update to rename value1 -> v1
>>>
>>> root
>>>  |-- id: string (nullable = true)
>>>  |-- value: string (nullable = true)
>>>  |-- key: integer (nullable = false)
>>>  |-- v1: string (nullable = true)
>>>  |-- value2: string (nullable = true)
>>>
>>> schema update to rename key -> newKey ( I know changing partition key is
>>> not good idea but this is a test :) )
>>>
>>> root
>>>  |-- id: string (nullable = true)
>>>  |-- value: string (nullable = true)
>>>  |-- newKey: integer (nullable = false)
>>>  |-- v1: string (nullable = true)
>>>  |-- value2: string (nullable = true)
>>>
>>>
>>> when I read data frame using spark I get  following schema
>>>
>>> root
>>>  |-- id: string (nullable = true)
>>>  |-- value: string (nullable = true)
>>>  |-- newKey: integer (nullable = false)
>>>  |-- v1: string (nullable = true)
>>>  |-- value2: string (nullable = true)
>>>
>>> but when I try to run query or scan using changed column in where clause
>>> I get following exception
>>>
>>>
>>> INFO TableScan: Scanning table /tmp/schema-evolution snapshot
>>> 1550184572006 created at 2019-02-14 14:49:32.189 with filter
>>> not_null(ref(name="v1"))
>>> Exception in thread "main"
>>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute,
>>> tree:
>>> Exchange SinglePartition
>>> +- *(1) HashAggregate(keys=[], functions=[partial_count(1)],
>>> output=[count#77L])
>>>    +- *(1) Project
>>>       +- *(1) Filter (isnotnull(v1#60) && (cast(v1#60 as int) = 0))
>>>          +- *(1) DataSourceV2Scan [v1#60],
>>> IcebergScan(table=/tmp/schema-evolution, type=struct<4: v1: optional
>>> string>, filters=[not_null(ref(name="v1"))])
>>>
>>> at
>>> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
>>> at
>>> org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)
>>> at
>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>>>
>>> Caused by: com.netflix.iceberg.exceptions.ValidationException: Cannot
>>> find field 'v1' in struct: struct<1: id: optional string, 2: value:
>>> optional string, 3: key: required int, 4: value1: optional string, 5:
>>> value2: optional string>
>>> at
>>> com.netflix.iceberg.exceptions.ValidationException.check(ValidationException.java:39)
>>> at
>>> com.netflix.iceberg.expressions.UnboundPredicate.bind(UnboundPredicate.java:46)
>>>
>>>
>>> I ran same query using where various combinations "v1 = 0" , "value1 =
>>> 0" , "key = 0" and "newKey = 0"
>>>
>>> What is best way to query data in iceberg table when schema is changed?
>>>
>>>
>>> following output from metadata json
>>>
>>>
>>> <       "name" : "key",
>>> ---
>>> >       "name" : "newKey",
>>> 25c25
>>> <       "name" : "value1",
>>> ---
>>> >       "name" : "v1",
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Iceberg Developers" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to iceberg-devel+unsubscr...@googlegroups.com.
>>> To post to this group, send email to iceberg-de...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/iceberg-devel/3efe985e-2302-412b-a899-8efe1fbf13c8%40googlegroups.com
>>> <https://groups.google.com/d/msgid/iceberg-devel/3efe985e-2302-412b-a899-8efe1fbf13c8%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Iceberg Developers" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to iceberg-devel+unsubscr...@googlegroups.com.
>> To post to this group, send email to iceberg-de...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/iceberg-devel/CAO32DPxrri4Oz%2BuX6vwgdh3NhW5FgxEmTumRrba5N6M6Rkuy5Q%40mail.gmail.com
>> <https://groups.google.com/d/msgid/iceberg-devel/CAO32DPxrri4Oz%2BuX6vwgdh3NhW5FgxEmTumRrba5N6M6Rkuy5Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Question about schema evolution in iceberg table

Reply via email to