sahil1105 commented on code in PR #43661:
URL: https://github.com/apache/arrow/pull/43661#discussion_r1715751879


##########
cpp/src/arrow/dataset/file_parquet.cc:
##########
@@ -555,6 +562,57 @@ Future<std::shared_ptr<parquet::arrow::FileReader>> 
ParquetFileFormat::GetReader
       });
 }
 
+struct CastingGenerator {

Review Comment:
   > Parquet logical type doesn't have an arrow schema, isn't it? 
   
   As far as I understand, the parquet metadata may or may not have the arrow 
schema. I believe it depends on the writer. It looks like it tries to get that 
using `GetOriginSchema` in `SchemaManifest::Make`. However, the schema at write 
time might not be the same as the schema the reader expects.
   
   > Binary reader reads from ::arrow::BinaryBuilder, and casting it to 
user-specified binary type.
   
   Sorry, I didn't quite follow. Are you saying that we should use this to do 
the cast at read time somehow?
   
   > I think a native cast is better here but this doesn't solve your problem, 
perhaps I can trying to add a naive SchemaManifest with hint solving here, but 
it would spend some time.
   > Maybe we should rethink the GetTypeForNode handling for 
string/large_string/stringView, or using some handle written type hint here.
   
   That makes sense to me.
   
   > Maybe I can add separate issue for that
   
   That would be great, thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to