alexisdondon opened a new issue, #42173:
URL: https://github.com/apache/arrow/issues/42173
### Describe the bug, including details regarding any error messages,
version, and platform.
Giving a dataset if i try to write this dataset as a partitionned parquet
dataset to a on premise s3 like minio on a path like s3/mybucket/data/mydataset
```
# Définir le nombre de lignes
n <- 100000
# Générer des valeurs numériques aléatoires pour deux colonnes
set.seed(123) # Pour la reproductibilité
num_col1 <- runif(n, min = 0, max = 100) # Valeurs numériques entre 0 et 100
num_col2 <- rnorm(n, mean = 50, sd = 10) # Valeurs normalement distribuées
avec une moyenne de 50 et un écart-type de 10
# Générer des chaînes de caractères aléatoires pour une colonne
char_col <- replicate(n, paste0(sample(LETTERS, 5, replace = TRUE), collapse
= ""))
# Générer des valeurs qualitatives pour une colonne
qual_col <- sample(c("A", "B", "C", "D"), n, replace = TRUE)
# Construire le data.frame
df <- data.frame(
numeric1 = num_col1,
numeric2 = num_col2,
character = char_col,
qualitative = qual_col,
stringsAsFactors = FALSE
)
# Afficher les premières lignes du data.frame
head(df)
# Configurer l'accès au S3
minio <- arrow::S3FileSystem$create(
endpoint_override = Sys.getenv("S3_ENDPOINT"),
access_key = Sys.getenv("AWS_ACCESS_KEY_ID"),
secret_key = Sys.getenv("AWS_SECRET_ACCESS_KEY"),
session_token = Sys.getenv("AWS_SESSION_TOKEN")
)
df |> arrow::write_dataset(
minio$path(paste0("mybucket/data/mydataset")),
partitioning = "qualitative",
format= "parquet"
)
```
Then i have a HEAD request on s3 that is denied, giving to the user
```s3:ListBucket``` on mybucket resolve the bug but giving ListBucket is not
without security impact.
```
Error: IOError: When testing for existence of bucket 'mybucket': AWS Error
ACCESS_DENIED during HeadBucket operation: No response body.
```
There was some discussion/issues about arrow having a mode to non check the
existence or not create the bucket if not exists.
With pyarrow and the same acl on s3 i can write the dataset the wrapper do
not check for existence or check without a HEAD at list.
### Component(s)
R
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]