Hi Sandeep,

It looks like the problem is that your schema doesn’t match. There are
columns in your dataset that don’t appear in your table schema, like
tableLocation. When Iceberg tries to match up the dataset’s schema with the
table’s schema, it can’t find those fields by name and hits an error.

I think if you add a select, it should work:

myDS.select("mwId", "mwVersion", "id", "id_str", "text", "created_at",
"lang").write()
        .format("iceberg")
        .mode("append")
        .save(getTableLocation())

Iceberg has code to catch the schema mismatch and throw an exception, but
it looks like it runs after the point where this is failing. We should fix
Iceberg to correctly assign IDs so you get a better error message. I’ll
open an issue for this.

Another fix is also coming in the next Spark release. In 2.4, Spark doesn’t
validate the schema of a dataset when writing. That is fixed in master and
will be in the 3.0 release.

rb

On Thu, Aug 22, 2019 at 11:39 AM Sandeep Sagar <sandeep.sa...@meltwater.com>
wrote:

> Hello,
> Newbie here.Need help to figure out the issue here.
> Doing a simple local spark Save using Iceberg with S3.
> I see that my metadata folder was created in S3, so my schema/table
> creation was successful.
> When I try to run a Spark Write, I get a NullPointerException at
>
> java.lang.NullPointerException
> at org.apache.iceberg.types.ReassignIds.field(ReassignIds.java:77)
> at org.apache.iceberg.types.ReassignIds.field(ReassignIds.java:28)
> at
> org.apache.iceberg.types.TypeUtil$VisitFieldFuture.get(TypeUtil.java:331)
> at com.google.common.collect.Iterators$6.transform(Iterators.java:783)
> at
> com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)
> at com.google.common.collect.Iterators.addAll(Iterators.java:356)
> at com.google.common.collect.Lists.newArrayList(Lists.java:143)
> at com.google.common.collect.Lists.newArrayList(Lists.java:130)
> at org.apache.iceberg.types.ReassignIds.struct(ReassignIds.java:55)
> at org.apache.iceberg.types.ReassignIds.struct(ReassignIds.java:28)
> at org.apache.iceberg.types.TypeUtil.visit(TypeUtil.java:364)
> at org.apache.iceberg.types.TypeUtil$VisitFuture.get(TypeUtil.java:316)
> at org.apache.iceberg.types.ReassignIds.schema(ReassignIds.java:40)
> at org.apache.iceberg.types.ReassignIds.schema(ReassignIds.java:28)
> at org.apache.iceberg.types.TypeUtil.visit(TypeUtil.java:336)
> at org.apache.iceberg.types.TypeUtil.reassignIds(TypeUtil.java:137)
> at
> org.apache.iceberg.spark.SparkSchemaUtil.convert(SparkSchemaUtil.java:163)
> at
> org.apache.iceberg.spark.source.IcebergSource.validateWriteSchema(IcebergSource.java:146)
> at
> org.apache.iceberg.spark.source.IcebergSource.createWriter(IcebergSource.java:76)
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:255)
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
>
> Maybe I am making a mistake in Schema Creation?
>
> new Schema(
>         required(1, "mwId", Types.StringType.get()),
>         required(2, "mwVersion", LongType.get()),
>         required(3, "id", Types.LongType.get()),
>         required(4, "id_str", Types.StringType.get()),
>         optional(5, "text", Types.StringType.get()),
>         optional(6, "created_at", Types.StringType.get()),
>         optional(7, "lang", Types.StringType.get())
> );
>
> PartitionSpec I used was PartitionSpec.unpartitioned();
>
> The write code I used was:
>
> Dataset<TweetItem> myDS;
>
> ...... (populate myDS)
>
> myDS.write()
>         .format("iceberg")
>         .mode("append")
>         .save(getTableLocation());
>
>
> If I do a PrintSchema, I get:
>
> root
>  |-- columns: struct (nullable = true)
>  |-- created_at: string (nullable = true)
>  |-- encoder: struct (nullable = true)
>  |-- id: long (nullable = true)
>  |-- id_str: string (nullable = true)
>  |-- lang: string (nullable = true)
>  |-- mwId: string (nullable = true)
>  |-- mwVersion: long (nullable = true)
>  |-- partitionSpec: struct (nullable = true)
>  |-- schema: struct (nullable = true)
>  |    |-- aliases: map (nullable = true)
>  |    |    |-- key: string
>  |    |    |-- value: integer (valueContainsNull = true)
>  |-- tableLocation: string (nullable = true)
>  |-- text: string (nullable = true)
>
> Appreciate your help.
>
> regards
> Sandeep
>
> The information contained in this email may be confidential. It has been
> sent for the sole use of the intended recipient(s). If the reader of this
> email is not an intended recipient, you are hereby notified that any
> unauthorized review, use, disclosure, dissemination, distribution, or
> copying of this message is strictly prohibited. If you have received this
> email in error, please notify the sender immediately and destroy all copies
> of the message.



-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to