mmwinther commented on issue #30481:
URL: https://github.com/apache/arrow/issues/30481#issuecomment-1441883373

   We experience this same bug with PyArrow v11. Tested that the same 
partitioned directory works with:
   
   - PyArrow v10
   - Use of `use_legacy_dataset=False`
   
   ## Code example
   ```python
   import gcsfs
   import pyarrow as pa
    
   gcs = gcsfs.GCSFileSystem()
   parquet_ds = pq.ParquetDataset("<redacted-bucket-name>/partition/dir", 
filesystem=gcs)
   ```
   
   ## Details of filesystem object
   ```
   gcs.isdir(partition_base_dir_path)
   >> True
   ```
   ```
   gcs.exists(partition_base_dir_path)
   >> True
   ```
   ```
   gcs.info(partition_base_dir_path)
   >> {'bucket': '<redacted-bucket-name>',
       'name': '<redacted-bucket-name>/partition/dir',
       'size': 0,
       'storageClass': 'DIRECTORY',
       'type': 'directory'}
   ```
   ```
   gcs.find(partition_base_dir_path, maxdepth=None, withdirs=True, detail=True)
   {'<redacted-bucket-name>/partition/dir': {'Key': 'partition/dir',
     'Size': 0,
     'name': '<redacted-bucket-name>/partition/dir',
     'StorageClass': 'DIRECTORY',
     'type': 'directory',
     'size': 0},
    '<redacted-bucket-name>/partition/dir/': {'kind': 'storage#object',
     'id': '<redacted-bucket-name>/partition/dir//1645614930653539',
     'selfLink': 
'https://www.googleapis.com/storage/v1/b/<redacted-bucket-name>/o/partition%2Fdir%2F',
     'mediaLink': 
'https://storage.googleapis.com/download/storage/v1/b/<redacted-bucket-name>/o/partition%2Fdir%2F?generation=1645614930653539&alt=media',
     'name': '<redacted-bucket-name>/partition/dir/',
     'bucket': '<redacted-bucket-name>',
     'generation': '1645614930653539',
     'metageneration': '1',
     'contentType': 'application/octet-stream',
     'storageClass': 'STANDARD',
     'size': 0,
     'md5Hash': '1B2M2Y8AsgTpgAmY7PhCfg==',
     'crc32c': 'AAAAAA==',
     'etag': 'COPig6vZlfYCEAE=',
     'timeCreated': '2022-02-23T11:15:30.656Z',
     'updated': '2022-02-23T11:15:30.656Z',
     'timeStorageClassUpdated': '2022-02-23T11:15:30.656Z',
     'type': 'file'},
    '<redacted-bucket-name>/partition/dir/_SUCCESS': {'kind': 'storage#object',
     'id': '<redacted-bucket-name>/partition/dir/_SUCCESS/1645614930849546',
     'selfLink': 
'https://www.googleapis.com/storage/v1/b/<redacted-bucket-name>/o/partition%2Fdir%2F_SUCCESS',
     'mediaLink': 
'https://storage.googleapis.com/download/storage/v1/b/<redacted-bucket-name>/o/partition%2Fdir%2F_SUCCESS?generation=1645614930849546&alt=media',
     'name': '<redacted-bucket-name>/partition/dir/_SUCCESS',
     'bucket': '<redacted-bucket-name>',
     'generation': '1645614930849546',
     'metageneration': '1',
     'contentType': 'application/octet-stream',
     'storageClass': 'STANDARD',
     'size': 0,
     'md5Hash': '1B2M2Y8AsgTpgAmY7PhCfg==',
     'crc32c': 'AAAAAA==',
     'etag': 'CIrej6vZlfYCEAE=',
     'timeCreated': '2022-02-23T11:15:30.851Z',
     'updated': '2022-02-23T11:15:30.851Z',
     'timeStorageClassUpdated': '2022-02-23T11:15:30.851Z',
     'type': 'file'},
    
'<redacted-bucket-name>/partition/dir/part-00000-d2924cb4-ae44-4401-8bfd-b67a4c9ab9ce-c000.snappy.parquet':
 {'kind': 'storage#object',
     'id': 
'<redacted-bucket-name>/partition/dir/part-00000-d2924cb4-ae44-4401-8bfd-b67a4c9ab9ce-c000.snappy.parquet/1645614930433213',
     'selfLink': 
'https://www.googleapis.com/storage/v1/b/<redacted-bucket-name>/o/partition%2Fdir%2Fpart-00000-d2924cb4-ae44-4401-8bfd-b67a4c9ab9ce-c000.snappy.parquet',
     'mediaLink': 
'https://storage.googleapis.com/download/storage/v1/b/<redacted-bucket-name>/o/partition%2Fdir%2Fpart-00000-d2924cb4-ae44-4401-8bfd-b67a4c9ab9ce-c000.snappy.parquet?generation=1645614930433213&alt=media',
     'name': 
'<redacted-bucket-name>/partition/dir/part-00000-d2924cb4-ae44-4401-8bfd-b67a4c9ab9ce-c000.snappy.parquet',
     'bucket': '<redacted-bucket-name>',
     'generation': '1645614930433213',
     'metageneration': '1',
     'contentType': 'application/octet-stream',
     'storageClass': 'STANDARD',
     'size': 993,
     'md5Hash': 'U/QwH2SM2un91crFW2lqWA==',
     'crc32c': '1iQeEw==',
     'etag': 'CL2p9qrZlfYCEAE=',
     'timeCreated': '2022-02-23T11:15:30.435Z',
     'updated': '2022-02-23T11:15:30.435Z',
     'timeStorageClassUpdated': '2022-02-23T11:15:30.435Z',
     'type': 'file'},
    
'<redacted-bucket-name>/partition/dir/part-00001-d2924cb4-ae44-4401-8bfd-b67a4c9ab9ce-c000.snappy.parquet':
 {'kind': 'storage#object',
     'id': 
'<redacted-bucket-name>/partition/dir/part-00001-d2924cb4-ae44-4401-8bfd-b67a4c9ab9ce-c000.snappy.parquet/1645614930179084',
     'selfLink': 
'https://www.googleapis.com/storage/v1/b/<redacted-bucket-name>/o/partition%2Fdir%2Fpart-00001-d2924cb4-ae44-4401-8bfd-b67a4c9ab9ce-c000.snappy.parquet',
     'mediaLink': 
'https://storage.googleapis.com/download/storage/v1/b/<redacted-bucket-name>/o/partition%2Fdir%2Fpart-00001-d2924cb4-ae44-4401-8bfd-b67a4c9ab9ce-c000.snappy.parquet?generation=1645614930179084&alt=media',
     'name': 
'<redacted-bucket-name>/partition/dir/part-00001-d2924cb4-ae44-4401-8bfd-b67a4c9ab9ce-c000.snappy.parquet',
     'bucket': '<redacted-bucket-name>',
     'generation': '1645614930179084',
     'metageneration': '1',
     'contentType': 'application/octet-stream',
     'storageClass': 'STANDARD',
     'size': 993,
     'md5Hash': 'b5UF9wxWOK9lzAl9HnSMjw==',
     'crc32c': '1Es0QA==',
     'etag': 'CIzo5qrZlfYCEAE=',
     'timeCreated': '2022-02-23T11:15:30.188Z',
     'updated': '2022-02-23T11:15:30.188Z',
     'timeStorageClassUpdated': '2022-02-23T11:15:30.188Z',
     'type': 'file'}}
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to