joosthooz commented on PR #13820:
URL: https://github.com/apache/arrow/pull/13820#issuecomment-1214908119

   I added a test by copying and modifying the one for the csv reader, but ran 
into the following problem:
   The test for latin-1 encoding works fine. However, the UTF16 test fails 
because there seems to be something wrong with the schema detection. The 
equality test fails, even if the schemas seem to be identical from a python 
point of view (the diffs that pytest prints are identical). Adding some printfs 
in the C++ code shows that the fields of `dataset.schema` seem to be empty:
   ```
   Schema Equals():
   this fp: 'S{Fn', other fp: 'S{Fna{@N};Fnb{@O};L}'
   this field: '', that field: 'a: string'Schema Equals(): field 0 not equal
   ```
   But this looks to me like a different problem with detecting the schema of a 
UTF16 encoded file. Should I try to create a reproducible example and file a 
new JIRA? Or is this something we should address here?
   In the meantime, I removed the test in 
[47a3462](https://github.com/apache/arrow/pull/13820/commits/47a3462b756cf92594470cedcd0f56eaf6248016)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to