setop commented on issue #4799:
URL: https://github.com/apache/arrow-rs/issues/4799#issuecomment-2132836977
Same issue with version 51.0.0
From the one big CSV cut into two, I created the first partquet file with
parquet-cpp-arrow
```
Metadata for file: E2021.parquet
version: 1
num of rows: 4283692
created by: parquet-cpp-arrow version 5.0.0
metadata:
ARROW:schema:
/////4ABAAAQAAAAAAAKAAwABgAFAAgACgAAAAABBAAMAAAACAAIAAAABAAIAAAABAAAAAYAAAAUAQAA0AAAAJwAAABoAAAAOAAAAAQAAAAU////AAABAxAAAAAcAAAABAAAAAAAAAAFAAAAcHJpY2UABgAIAAYABgAAAAAAAgBE////AAABAhAAAAAUAAAABAAAAAAAAAADAAAAZGF5ADD///8AAAABQAAAAHD///8AAAECEAAAABgAAAAEAAAAAAAAAAUAAABtb250aAAAAGD///8AAAABQAAAAKD///8AAAECEAAAABgAAAAEAAAAAAAAAAQAAAB5ZWFyAAAAAJD///8AAAABQAAAAND///8AAAECEAAAABgAAAAEAAAAAAAAAAQAAABmdWVsAAAAAMD///8AAAABQAAAABAAFAAIAAYABwAMAAAAEAAQAAAAAAABAhAAAAAgAAAABAAAAAAAAAAFAAAAcGR2aWQAAAAIAAwACAAHAAgAAAAAAAABQAAAAAAAAAA=
message schema {
OPTIONAL INT64 pdvid;
OPTIONAL INT64 fuel;
OPTIONAL INT64 year;
OPTIONAL INT64 month;
OPTIONAL INT64 day;
OPTIONAL DOUBLE price;
}
```
Then, using the same schema (`message schema { ... }` in a file), I created
the second half with parquet-rs:
```
Metadata for file: E2022.parquet
version: 1
num of rows: 5044596
created by: parquet-rs version 51.0.0
metadata:
ARROW:schema:
/////5ABAAAQAAAAAAAKAAwACgAJAAQACgAAABAAAAAAAQQACAAIAAAABAAIAAAABAAAAAYAAAAoAQAA5AAAALAAAAB8AAAATAAAABQAAAAQABYAEAAOAA8ABAAAAAgAEAAAABgAAAAcAAAAAAABAxgAAAAAAAYACAAGAAYAAAAAAAIAAAAAAAUAAABwcmljZQAAAET///8QAAAAGAAAAAAAAQIUAAAANP///0AAAAAAAAABAAAAAAMAAABkYXkAcP///xAAAAAYAAAAAAABAhQAAABg////QAAAAAAAAAEAAAAABQAAAG1vbnRoAAAAoP///xAAAAAYAAAAAAABAhQAAACQ////QAAAAAAAAAEAAAAABAAAAHllYXIAAAAA0P///xAAAAAYAAAAAAABAhQAAADA////QAAAAAAAAAEAAAAABAAAAGZ1ZWwAAAAAEAAUABAADgAPAAQAAAAIABAAAAAYAAAAIAAAAAAAAQIcAAAACAAMAAQACwAIAAAAQAAAAAAAAAEAAAAABQAAAHBkdmlkAAAA
message arrow_schema {
OPTIONAL INT64 pdvid;
OPTIONAL INT64 fuel;
OPTIONAL INT64 year;
OPTIONAL INT64 month;
OPTIONAL INT64 day;
OPTIONAL DOUBLE price;
}
```
When I try to concat them, I get `Error: General("inputs must have the same
schema ...`
The only diff is the naming of the schema, `schema` vs `arrow_schema`.
```diff
--- a.txt 2024-05-27 09:32:48.409232203 +0200
+++ b.txt 2024-05-27 09:32:55.073572763 +0200
@@ -1,6 +1,6 @@
GroupType {
basic_info: BasicTypeInfo {
- name: \"schema\",
+ name: \"arrow_schema\",
repetition: None,
converted_type: NONE,
logical_type: None,
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]