pitrou commented on code in PR #14029:
URL: https://github.com/apache/arrow/pull/14029#discussion_r981522367
##########
python/pyarrow/tests/parquet/test_metadata.py:
##########
@@ -433,7 +434,9 @@ def test_write_metadata(tempdir):
assert parquet_meta_mult.num_row_groups == 2
# append metadata with different schema raises an error
- with pytest.raises(RuntimeError, match="requires equal schemas"):
+ msg = ("AppendRowGroups requires equal schemas.\n"
+ "These two columns differ at index: 0")
Review Comment:
The message is a bit confusing and it took me some time to understand what
"differ at index 0" meant.
I would suggest "The two columns with index 0 differ".
##########
python/pyarrow/tests/parquet/test_metadata.py:
##########
@@ -15,6 +15,7 @@
# specific language governing permissions and limitations
# under the License.
+import io
Review Comment:
Can we keep stdlib imports alphabetically ordered?
##########
python/pyarrow/tests/parquet/test_metadata.py:
##########
@@ -580,3 +583,33 @@ def test_metadata_equals():
match = "Argument 'other' has incorrect type"
with pytest.raises(TypeError, match=match):
original_metadata.equals(None)
+
+
[email protected]("t1,t2,expected", (
+ ({'col1': range(10)}, {'col1': range(10)}, None),
+ ({'col1': range(10)}, {'col2': range(10)},
+ "These two columns differ at index: 0"),
+ ({'col1': range(10), 'col2': range(10)}, {'col3': range(10)},
+ "This schema has 2 columns, other has 1")
+))
+def test_metadata_append_row_groups_diff(t1, t2, expected):
Review Comment:
For clarity
```suggestion
def test_metadata_append_row_groups_diff(t1, t2, expected_error):
```
##########
cpp/src/parquet/schema.h:
##########
@@ -433,6 +433,8 @@ class PARQUET_EXPORT SchemaDescriptor {
int ColumnIndex(const schema::Node& node) const;
bool Equals(const SchemaDescriptor& other) const;
+ bool Equals(const SchemaDescriptor& other,
+ std::shared_ptr<std::stringstream> diff_msg) const;
Review Comment:
Three things: 1) this should take a raw pointer (no need to give ownership),
2) this should take a generic `ostream` not a specific implementation, 3) no
need for two separate signatures as the second argument can be made optional
```suggestion
bool Equals(const SchemaDescriptor& other,
std::ostream* diff_output = NULLPTR) const;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]