[GitHub] [sedona] sebbegg commented on issue #1031: Errors reading data written with sedona 1.3.1

via GitHub Tue, 26 Sep 2023 23:45:49 -0700


sebbegg commented on issue #1031:
URL: https://github.com/apache/sedona/issues/1031#issuecomment-1736797691


   Hi @jiayuasu ,
   
   thanks for the feedback, we'll give it a try.
   I get the point that `ArrayType[byte]` is basically the same as 
`BinaryType`, but it seems that the parquet reader doesn't "understand" this.
   We read/write data with delta (https://delta.io/), which is basically 
parquet files plus a transaction log.
   So in the end this should be re-using the plain spark parquet data sources, 
as I think is only visible in the traceback:
   
   ```
   Caused by: org.apache.spark.sql.AnalysisException: Invalid Spark read type: 
expected optional group gps_dr_position (LIST) {
     repeated group list {
       required int32 element (INTEGER(8,true));
     }
   } to be list type but found Some(BinaryType)
        at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkConversionRequirement(ParquetSchemaConverter.scala:728)
        at 
org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convertGroupField$3(ParquetSchemaConverter.scala:325)
        at scala.Option.fold(Option.scala:251)
        at 
org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertGroupField(ParquetSchemaConverter.scala:306)
        at 
org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertField(ParquetSchemaConverter.scala:174)
        at 
org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convertInternal$3(ParquetSchemaConverter.scala:133)
        at 
org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convertInternal$3$adapted(ParquetSchemaConverter.scala:103)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [sedona] sebbegg commented on issue #1031: Errors reading data written with sedona 1.3.1

Reply via email to