AlenkaF commented on code in PR #13629:
URL: https://github.com/apache/arrow/pull/13629#discussion_r923074731


##########
python/pyarrow/tests/parquet/test_metadata.py:
##########
@@ -531,3 +532,19 @@ def test_metadata_exceeds_message_size():
         buf = out.getvalue()
 
     metadata = pq.read_metadata(pa.BufferReader(buf))
+
+
+def test_metadata_schema_filesystem(tmpdir):
+    table = pa.table({"a": [1, 2, 3]})
+
+    # URI writing to local file.
+    file_path = 'file:///' + os.path.join(str(tmpdir), "data.parquet")
+
+    pq.write_table(table, file_path)
+
+    # Get expected `metadata` from path.
+    metadata = pq.read_metadata(tmpdir / '/data.parquet')

Review Comment:
   Running this locally I can confirm a segfault. I think it happens because 
the table metadata is (correctly) empty:
   
   ```python
   >>> import pyarrow as pa
   >>> import pyarrow.parquet as pq
   
   >>> table = pa.table({"a": [1, 2, 3]})
   >>> file_path = "/tmp/data.parquet"
   >>> metadata = table.schema.metadata
   
   >>> pq.read_metadata(file_path).equals(metadata)
   zsh: segmentation fault  python
   ```
   
   Which would deserve an issue (a warning should be returned without a crash).
   
   Maybe a better option to test ParquetFile metadata would be to inspect 
individual attributes:
   
   ```python
   >>> pq.read_metadata(file_path).num_columns == 1
   True
   >>> pq.read_metadata(file_path).num_rows == 3
   True
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to