pitrou commented on code in PR #12701:
URL: https://github.com/apache/arrow/pull/12701#discussion_r850219507
##########
cpp/src/arrow/dataset/file_base.h:
##########
@@ -387,6 +387,8 @@ struct ARROW_DS_EXPORT FileSystemDatasetWriteOptions {
/// Controls what happens if an output directory already exists.
ExistingDataBehavior existing_data_behavior = ExistingDataBehavior::kError;
+ bool create_dir = true;
Review Comment:
Can you add a docstring for this?
##########
python/pyarrow/dataset.py:
##########
@@ -753,7 +753,7 @@ def write_dataset(data, base_dir, basename_template=None,
format=None,
max_partitions=None, max_open_files=None,
max_rows_per_file=None, min_rows_per_group=None,
max_rows_per_group=None, file_visitor=None,
- existing_data_behavior='error'):
+ existing_data_behavior='error', create_dir=True):
Review Comment:
Can you add the corresponding parameter description in the docstring below?
##########
cpp/src/arrow/dataset/dataset_writer.cc:
##########
@@ -476,7 +478,8 @@ class DatasetWriter::DatasetWriterImpl : public
util::AsyncDestroyable {
}
Future<> DoWriteRecordBatch(std::shared_ptr<RecordBatch> batch,
- const std::string& directory, const std::string&
prefix) {
+ const std::string& directory, const std::string&
prefix,
+ const bool& create_dir) {
Review Comment:
No need to pass primitive types by reference.
```suggestion
bool create_dir) {
```
##########
python/pyarrow/tests/test_dataset.py:
##########
@@ -4286,6 +4287,55 @@ def test_write_dataset_s3(s3_example_simple):
assert result.equals(table)
+_minio_put_only_policy = """{
+ "Version": "2012-10-17",
+ "Statement": [
+ {
+ "Effect": "Allow",
+ "Action": [
+ "s3:PutObject",
+ "s3:ListBucket",
+ "s3:GetObjectVersion"
+ ],
+ "Resource": [
+ "arn:aws:s3:::*"
+ ]
+ }
+ ]
+}"""
+
+
[email protected]
[email protected]
+def test_write_dataset_s3_put_only(s3_server, limited_s3_user):
Review Comment:
Can you add a comment explaining what this test is for, and referencing the
JIRA issue number?
##########
cpp/src/arrow/dataset/dataset_writer.cc:
##########
@@ -476,7 +478,8 @@ class DatasetWriter::DatasetWriterImpl : public
util::AsyncDestroyable {
}
Future<> DoWriteRecordBatch(std::shared_ptr<RecordBatch> batch,
- const std::string& directory, const std::string&
prefix) {
+ const std::string& directory, const std::string&
prefix,
+ const bool& create_dir) {
Review Comment:
Also, I may be misunderstanding, but this parameter doesn't seem actually
used?
##########
python/pyarrow/tests/conftest.py:
##########
@@ -311,3 +313,23 @@ def s3_server(s3_connection):
finally:
if proc is not None:
proc.kill()
+
+
[email protected](scope='session')
+def limited_s3_user(request, s3_server):
Review Comment:
Also, why is this a fixture if it's invoked as a regular function?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]