Re: [PR] feat: Partially implement file commit protocol for native Parquet writes [datafusion-comet]

via GitHub Wed, 26 Nov 2025 19:32:04 -0800


wForget commented on PR #2828:
URL: 
https://github.com/apache/datafusion-comet/pull/2828#issuecomment-3584067627

@andygrove Thank you for your work. The current implementation fully follows
the FileCommitProtocol specification, and it works well for
`InsertIntoHadoopFsRelationCommand`. However, it has some known issues that
make it potentially non-general. `InsertIntoHadoopFsRelationCommand` does not
create a spark staging directory for non-dynamic partition writes, so it does
not support concurrent writes in non-dynamic partition scenarios. Therefore, I
prefer to always create a staging directory for native writes.

refers:
https://issues.apache.org/jira/browse/SPARK-37210

https://github.com/apache/spark/blob/161ed3d18dc346d3ad970b7a5997e42ea05b5206/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala#L175-L180

BTW, I usually disable hive datasource convertion on writes.

```
spark.sql.hive.convertInsertingPartitionedTable false
spark.sql.hive.convertInsertingUnpartitionedTable false
spark.sql.hive.convertMetastoreInsertDir false
spark.sql.hive.convertMetastoreCtas false
```

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Partially implement file commit protocol for native Parquet writes [datafusion-comet]

Reply via email to