alamb commented on issue #12510:
URL: https://github.com/apache/datafusion/issues/12510#issuecomment-2366773157
I looked into this issue more -- I think fundamentally the schema is
different in the files, and there isn't any way, short of some sort of
configuration to cast Binary --> String always, we would be able to special
case this
hits.parquet
```
Metadata for file: hits.parquet
version: 1
num of rows: 99997497
created by: parquet-cpp version 1.5.1-SNAPSHOT
message schema {
REQUIRED INT64 WatchID;
REQUIRED INT32 JavaEnable (INTEGER(16,true));
REQUIRED BYTE_ARRAY Title (STRING);
...
```
Thus I am closing this issue as won't do -- please let me know if you have
found something different @thinh2
hits_partitioned/hits_55.parquet
```
Metadata for file: hits_partitioned/hits_55.parquet
version: 1
num of rows: 1000000
created by: parquet-cpp version 1.5.1-SNAPSHOT
message schema {
OPTIONAL INT64 WatchID;
OPTIONAL INT32 JavaEnable (INTEGER(16,true));
OPTIONAL BYTE_ARRAY Title;
...
```
[hits_55.parquet.schema.txt](https://github.com/user-attachments/files/17089765/hits_55.parquet.schema.txt)
[hits.parquet.schema.txt](https://github.com/user-attachments/files/17089766/hits.parquet.schema.txt)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]