[GitHub] [arrow] jorisvandenbossche commented on pull request #9677: ARROW-11260: [C++][Dataset] Don't require dictionaries when specifying explicit partition schema

GitBox Fri, 12 Mar 2021 12:30:51 -0800


jorisvandenbossche commented on pull request #9677:
URL: https://github.com/apache/arrow/pull/9677#issuecomment-797737946



   > We could; we'd have to do that recursively, right? In case of a nested 
dictionary. (…though is that handled anyways?) 
   
   I don't think we can parse nested types from the file paths? 
   In that case, we wouldn't need to check it recursively.
   
   From a user point of view, having to specify `dictionaries="infer"` feels 
superfluous, as it is clear that's needed (but to be clear, this PR is already 
a nice improvement compared to the current situation! ;))
   
   
   > It also doesn't help the fact that we need a Partitioning, not a 
PartitioningFactory, when we want to write data, so the auto-detection might be 
a little too magical…
   
   Hmm, yes, that complicates things. When writing, you don't need to specify 
the dictionaries. But indeed you still need the actual Partitioning and not the 
factory. So returning the factory *if* the schemas has a dictionary type and no 
dictionaries are passed, would then fail when writing ..
   
   The current API mixing both for reading/writing and the full object / the 
factory makes it a bit complex ..


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on pull request #9677: ARROW-11260: [C++][Dataset] Don't require dictionaries when specifying explicit partition schema

Reply via email to