wForget commented on PR #2828:
URL: 
https://github.com/apache/datafusion-comet/pull/2828#issuecomment-3584067627

   @andygrove Thank you for your work. The current implementation fully follows 
the FileCommitProtocol specification, and it works well for 
`InsertIntoHadoopFsRelationCommand`. However, it has some known issues that 
make it potentially non-general. `InsertIntoHadoopFsRelationCommand` does not 
create a spark staging directory for non-dynamic partition writes, so it does 
not support concurrent writes in non-dynamic partition scenarios. Therefore, I 
prefer to always create a staging directory for native writes.
   
   refers:
   https://issues.apache.org/jira/browse/SPARK-37210
   
https://github.com/apache/spark/blob/161ed3d18dc346d3ad970b7a5997e42ea05b5206/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala#L175-L180
   
   BTW, I usually disable hive datasource convertion on writes.
   
   ```
   spark.sql.hive.convertInsertingPartitionedTable false
   spark.sql.hive.convertInsertingUnpartitionedTable false
   spark.sql.hive.convertMetastoreInsertDir        false
   spark.sql.hive.convertMetastoreCtas             false
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to