sebbegg commented on issue #1031: URL: https://github.com/apache/sedona/issues/1031#issuecomment-1736797691
Hi @jiayuasu , thanks for the feedback, we'll give it a try. I get the point that `ArrayType[byte]` is basically the same as `BinaryType`, but it seems that the parquet reader doesn't "understand" this. We read/write data with delta (https://delta.io/), which is basically parquet files plus a transaction log. So in the end this should be re-using the plain spark parquet data sources, as I think is only visible in the traceback: ``` Caused by: org.apache.spark.sql.AnalysisException: Invalid Spark read type: expected optional group gps_dr_position (LIST) { repeated group list { required int32 element (INTEGER(8,true)); } } to be list type but found Some(BinaryType) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkConversionRequirement(ParquetSchemaConverter.scala:728) at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convertGroupField$3(ParquetSchemaConverter.scala:325) at scala.Option.fold(Option.scala:251) at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertGroupField(ParquetSchemaConverter.scala:306) at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertField(ParquetSchemaConverter.scala:174) at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convertInternal$3(ParquetSchemaConverter.scala:133) at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convertInternal$3$adapted(ParquetSchemaConverter.scala:103) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
