wForget commented on PR #2828: URL: https://github.com/apache/datafusion-comet/pull/2828#issuecomment-3584067627
@andygrove Thank you for your work. The current implementation fully follows the FileCommitProtocol specification, and it works well for `InsertIntoHadoopFsRelationCommand`. However, it has some known issues that make it potentially non-general. `InsertIntoHadoopFsRelationCommand` does not create a spark staging directory for non-dynamic partition writes, so it does not support concurrent writes in non-dynamic partition scenarios. Therefore, I prefer to always create a staging directory for native writes. refers: https://issues.apache.org/jira/browse/SPARK-37210 https://github.com/apache/spark/blob/161ed3d18dc346d3ad970b7a5997e42ea05b5206/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala#L175-L180 BTW, I usually disable hive datasource convertion on writes. ``` spark.sql.hive.convertInsertingPartitionedTable false spark.sql.hive.convertInsertingUnpartitionedTable false spark.sql.hive.convertMetastoreInsertDir false spark.sql.hive.convertMetastoreCtas false ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
