Fokko commented on PR #36846:
URL: https://github.com/apache/arrow/pull/36846#issuecomment-1744547621

   Thanks @jorisvandenbossche for summarizing the pending issues. I was 
rebasing to see if the LLVM stuff has been fixed.
   
   > Do we need the pyarrow.FieldMergeOptions class? At the moment it actually 
doesn't really do anything, since we don't (yet) allow to tweak individual 
flags, only "permissive"/"default" (and the class itself actually also only 
exposes "permissive", not "default", as a static method to create that options 
object). Maybe we can just remove it for now, we can always add it back later 
if we want to expose individual options.
   
   I don't think it adds much value. Just accepting a string is much clearer to 
the end-user, and I think the majority of the people don't the fine-grained 
tuning.
   
   > For concat_tables, I find it confusing that a user still has to specify 
promote=True when they already specified field_merge_options="permissive" 
(because the latter only makes sense with promote=True). Essentially the 
promote=False can be seen as a "no promotion" field merge option?
   I think with this new promotion capability, the historically added promote 
keyword is unfortunately badly chosen. Essentially it means: "unify schemas" 
(that's also how it is called on the C++ side), and by default it hardly does 
any promotion (only non-existing fields or null fields to any type). If we keep 
promote keyword, I would expect that promote=True would do actual, permissive 
promotion.
   To be clear, this is not caused by this PR, but I would still prefer to get 
a nicer API. One option is to actually rename promote to unify_schemas (which 
is what it does). We can still keep promote as an alias for now for backwards 
compatibility.
   
   This is a good point, and I agree that `promote` is poorly chosen. You could 
have the situation that you don't want to add null fields, but you allow the 
promotion of the fields. I've added the `none` option as you suggested, and I 
think this is much cleaner. Also added support for `promote` if people still 
rely on it. It will emit a warning to let users know that they should switch to 
`promote_options`.
   
   > The keyword names: pa.concat_tables uses field_merge_options, 
pa.unify_schemas uses options. Should we try to align those names? I mentioned 
that I would like promote_options, which I think is more descriptive, but on 
the other hand deviates from the C++ terminology ..
   
   I like `promote_options` as well. Aligned the names there.
   
   > The last two bullet points are more about the final Python API, and 
doesn't impact the large part of the changes in the PR (the actual C++ 
changes), so maybe I shouldn't block the PR on those, and I am also fine with 
discussing those in a follow-up.
   
   We should not merge anything that we're going to rip out anyway, but I'm 
happy to help with some of the follow-ups. Getting this one in would be great 
since it is quite big already.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to