mmwinther commented on issue #30481:
URL: https://github.com/apache/arrow/issues/30481#issuecomment-1441883373
We experience this same bug with PyArrow v11. Tested that the same
partitioned directory works with:
- PyArrow v10
- Use of `use_legacy_dataset=False`
## Code example
```python
import gcsfs
import pyarrow as pa
gcs = gcsfs.GCSFileSystem()
parquet_ds = pq.ParquetDataset("<redacted-bucket-name>/partition/dir",
filesystem=gcs)
```
## Details of filesystem object
```
gcs.isdir(partition_base_dir_path)
>> True
```
```
gcs.exists(partition_base_dir_path)
>> True
```
```
gcs.info(partition_base_dir_path)
>> {'bucket': '<redacted-bucket-name>',
'name': '<redacted-bucket-name>/partition/dir',
'size': 0,
'storageClass': 'DIRECTORY',
'type': 'directory'}
```
```
gcs.find(partition_base_dir_path, maxdepth=None, withdirs=True, detail=True)
{'<redacted-bucket-name>/partition/dir': {'Key': 'partition/dir',
'Size': 0,
'name': '<redacted-bucket-name>/partition/dir',
'StorageClass': 'DIRECTORY',
'type': 'directory',
'size': 0},
'<redacted-bucket-name>/partition/dir/': {'kind': 'storage#object',
'id': '<redacted-bucket-name>/partition/dir//1645614930653539',
'selfLink':
'https://www.googleapis.com/storage/v1/b/<redacted-bucket-name>/o/partition%2Fdir%2F',
'mediaLink':
'https://storage.googleapis.com/download/storage/v1/b/<redacted-bucket-name>/o/partition%2Fdir%2F?generation=1645614930653539&alt=media',
'name': '<redacted-bucket-name>/partition/dir/',
'bucket': '<redacted-bucket-name>',
'generation': '1645614930653539',
'metageneration': '1',
'contentType': 'application/octet-stream',
'storageClass': 'STANDARD',
'size': 0,
'md5Hash': '1B2M2Y8AsgTpgAmY7PhCfg==',
'crc32c': 'AAAAAA==',
'etag': 'COPig6vZlfYCEAE=',
'timeCreated': '2022-02-23T11:15:30.656Z',
'updated': '2022-02-23T11:15:30.656Z',
'timeStorageClassUpdated': '2022-02-23T11:15:30.656Z',
'type': 'file'},
'<redacted-bucket-name>/partition/dir/_SUCCESS': {'kind': 'storage#object',
'id': '<redacted-bucket-name>/partition/dir/_SUCCESS/1645614930849546',
'selfLink':
'https://www.googleapis.com/storage/v1/b/<redacted-bucket-name>/o/partition%2Fdir%2F_SUCCESS',
'mediaLink':
'https://storage.googleapis.com/download/storage/v1/b/<redacted-bucket-name>/o/partition%2Fdir%2F_SUCCESS?generation=1645614930849546&alt=media',
'name': '<redacted-bucket-name>/partition/dir/_SUCCESS',
'bucket': '<redacted-bucket-name>',
'generation': '1645614930849546',
'metageneration': '1',
'contentType': 'application/octet-stream',
'storageClass': 'STANDARD',
'size': 0,
'md5Hash': '1B2M2Y8AsgTpgAmY7PhCfg==',
'crc32c': 'AAAAAA==',
'etag': 'CIrej6vZlfYCEAE=',
'timeCreated': '2022-02-23T11:15:30.851Z',
'updated': '2022-02-23T11:15:30.851Z',
'timeStorageClassUpdated': '2022-02-23T11:15:30.851Z',
'type': 'file'},
'<redacted-bucket-name>/partition/dir/part-00000-d2924cb4-ae44-4401-8bfd-b67a4c9ab9ce-c000.snappy.parquet':
{'kind': 'storage#object',
'id':
'<redacted-bucket-name>/partition/dir/part-00000-d2924cb4-ae44-4401-8bfd-b67a4c9ab9ce-c000.snappy.parquet/1645614930433213',
'selfLink':
'https://www.googleapis.com/storage/v1/b/<redacted-bucket-name>/o/partition%2Fdir%2Fpart-00000-d2924cb4-ae44-4401-8bfd-b67a4c9ab9ce-c000.snappy.parquet',
'mediaLink':
'https://storage.googleapis.com/download/storage/v1/b/<redacted-bucket-name>/o/partition%2Fdir%2Fpart-00000-d2924cb4-ae44-4401-8bfd-b67a4c9ab9ce-c000.snappy.parquet?generation=1645614930433213&alt=media',
'name':
'<redacted-bucket-name>/partition/dir/part-00000-d2924cb4-ae44-4401-8bfd-b67a4c9ab9ce-c000.snappy.parquet',
'bucket': '<redacted-bucket-name>',
'generation': '1645614930433213',
'metageneration': '1',
'contentType': 'application/octet-stream',
'storageClass': 'STANDARD',
'size': 993,
'md5Hash': 'U/QwH2SM2un91crFW2lqWA==',
'crc32c': '1iQeEw==',
'etag': 'CL2p9qrZlfYCEAE=',
'timeCreated': '2022-02-23T11:15:30.435Z',
'updated': '2022-02-23T11:15:30.435Z',
'timeStorageClassUpdated': '2022-02-23T11:15:30.435Z',
'type': 'file'},
'<redacted-bucket-name>/partition/dir/part-00001-d2924cb4-ae44-4401-8bfd-b67a4c9ab9ce-c000.snappy.parquet':
{'kind': 'storage#object',
'id':
'<redacted-bucket-name>/partition/dir/part-00001-d2924cb4-ae44-4401-8bfd-b67a4c9ab9ce-c000.snappy.parquet/1645614930179084',
'selfLink':
'https://www.googleapis.com/storage/v1/b/<redacted-bucket-name>/o/partition%2Fdir%2Fpart-00001-d2924cb4-ae44-4401-8bfd-b67a4c9ab9ce-c000.snappy.parquet',
'mediaLink':
'https://storage.googleapis.com/download/storage/v1/b/<redacted-bucket-name>/o/partition%2Fdir%2Fpart-00001-d2924cb4-ae44-4401-8bfd-b67a4c9ab9ce-c000.snappy.parquet?generation=1645614930179084&alt=media',
'name':
'<redacted-bucket-name>/partition/dir/part-00001-d2924cb4-ae44-4401-8bfd-b67a4c9ab9ce-c000.snappy.parquet',
'bucket': '<redacted-bucket-name>',
'generation': '1645614930179084',
'metageneration': '1',
'contentType': 'application/octet-stream',
'storageClass': 'STANDARD',
'size': 993,
'md5Hash': 'b5UF9wxWOK9lzAl9HnSMjw==',
'crc32c': '1Es0QA==',
'etag': 'CIzo5qrZlfYCEAE=',
'timeCreated': '2022-02-23T11:15:30.188Z',
'updated': '2022-02-23T11:15:30.188Z',
'timeStorageClassUpdated': '2022-02-23T11:15:30.188Z',
'type': 'file'}}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]