[GitHub] [incubator-iceberg] waterlx edited a comment on issue #457: DataFrame generated by Seq() might have schema conflict with Iceberg

GitBox Thu, 05 Sep 2019 23:21:26 -0700

waterlx edited a comment on issue #457: DataFrame generated by Seq() might have 
schema conflict with Iceberg
URL: 
https://github.com/apache/incubator-iceberg/issues/457#issuecomment-528725863
 
 
   String in Spark has default nullable setting as "true", and Iceberg has the 
following code to decide required or optional with respect to nullable()
   ``` java
   // public Type struct(StructType struct, List<Type> types) in 
SparkTypeToType.java
   if (field.nullable()) {
           newFields.add(Types.NestedField.optional(id, field.name(), type, 
doc));
   } else {
           newFields.add(Types.NestedField.required(id, field.name(), type, 
doc));
   }
   ```
   When in Iceberg, we add a " required" field as StringType, the error message 
is reported.
   This issue could be resolved by explicitly specifying "nullable" for schema 
in Spark, like:
   ``` scala
   
   val schema = StructType(List(
       StructField("string_column", StringType, nullable = false),
       ...
   }
   ```
   But I have no idea on how to resolve it when DataFrame is generated by Seq() 
or loading from a file


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] waterlx edited a comment on issue #457: DataFrame generated by Seq() might have schema conflict with Iceberg

Reply via email to