litao3rd commented on issue #37840:
URL: https://github.com/apache/arrow/issues/37840#issuecomment-1734935527

   > @bkietz I found a problem, when `UnifySchemas` called, it calles 
`SchemaBuilder::AddField`, when schema has same name, it has conflict solving 
options, when using `CONFLICT_MERGE`, it will call `Field::MergeWith`:
   > 
   > ```c++
   > Result<std::shared_ptr<Field>> Field::MergeWith(const Field& other,
   >                                                 MergeOptions options) 
const {
   >   if (name() != other.name()) {
   >     return Status::Invalid("Field ", name(), " doesn't have the same name 
as ",
   >                            other.name());
   >   }
   > 
   >   if (Equals(other, /*check_metadata=*/false)) {
   >     return Copy();
   >   }
   > 
   >   if (options.promote_nullability) {
   >     if (type()->Equals(other.type())) {
   >       return Copy()->WithNullable(nullable() || other.nullable());
   >     }
   >     std::shared_ptr<Field> promoted = MaybePromoteNullTypes(*this, other);
   >     if (promoted) return promoted;
   >   }
   > 
   >   return Status::Invalid("Unable to merge: Field ", name(),
   >                          " has incompatible types: ", type()->ToString(), 
" vs ",
   >                          other.type()->ToString());
   > }
   > ```
   > 
   > In this case, if we don't set explicit schema, and deduce by parquets, it 
will says `string` and `large_string` is not compatible and failed. Would this 
better having a Common type here? Or this case is expected ?
   
   I think the current approach is reasonable. However, having a common type as 
an option would also be good, but I don't want this common type to be set as 
the default. This case is expected for me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to