[
https://issues.apache.org/jira/browse/ARROW-14930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17455884#comment-17455884
]
Luis Morales commented on ARROW-14930:
--------------------------------------
I would say there is no problem in the server side. My thoughts on this:
*scality.get_file_info("dasynth/parquet/")*
This method through HEAD opeartions is asking for buckets or objects, but in
this case dasynth/parquet is none of them, it's just a prefix (or folder or
tag... name it here the way you want). That's the reason why the server answers
with object not found.
When using FileSelector is not asking previously if the object exists, it just
asks for the contents with a GET method and in that case it works properly.
Maybe with a new parameter with object_type = [bucket, object, tag] and apply a
different logic on each case:
bucket, object - > HEAD methods
tag -> the same logic as if it would use FileSelector.
would solve the problem
additionally in the dataset() method things should be changed too according to
this idea.
an additional example. if you use get_file_info with a file like this:
scality.get_file_info("dasynth/parquet/taxies/2019/month_year=2001-01/payment_type=1/9ccd9d4ae28a41e1acaf40ea594b61da.snappy.parquet")
it works despite of the folders parquet, taxies, 2019...
> [Python] FileNotFound when using bucket+folders in S3 + partitioned parquet
> ---------------------------------------------------------------------------
>
> Key: ARROW-14930
> URL: https://issues.apache.org/jira/browse/ARROW-14930
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 6.0.1
> Environment: linux + python 3.8
> Reporter: Luis Morales
> Priority: Trivial
> Fix For: 6.0.2
>
>
> When using dataset.Dataset with S3FileSystem with compatible S3 object
> sotrage, get an FileNotFoundError.
>
> My code:
>
> scality = fs.S3FileSystem(access_key='accessKey1',
> secret_key='verySecretKey1', endpoint_override="http://localhost:8000",
> region="")
> data = ds.dataset("dasynth/parquet/taxies/2019_june/", format="parquet",
> partitioning="hive", filesystem=scality)
--
This message was sent by Atlassian Jira
(v8.20.1#820001)