brancz opened a new issue, #8840:
URL: https://github.com/apache/arrow-rs/issues/8840

   **Describe the bug**
   
   When merging two schemas using `Schema::try_merge` where one side doesn't 
have a column but the other does, then it keep the nullability setting of the 
preexisting column. However, this semantically doesn't make sense, the merged 
version of that would be the field with nullable being true, since that is the 
implicit property of the schema that doesn't have the field.
   
   Consequently what this means is that it's impossible to merge schemas and 
therefore record batches where one side has a field that is nullable false, and 
the other doesn't have it at all. 
   
   **To Reproduce**
   
   ```
       #[test]
       fn test_schema_merge_nullability() {
           let merged = Schema::try_merge(vec![
               Schema::new(vec![
                   Field::new("first_name", DataType::Utf8, false),
               ]),
               Schema::new(vec![
                   Field::new("last_name", DataType::Utf8, false),
               ]),
           ])
           .unwrap();
   
           assert_eq!(
               merged,
               Schema::new(
                   vec![
                       Field::new("first_name", DataType::Utf8, true),
                       Field::new("last_name", DataType::Utf8, true),
                   ],
               )
           );
       }
   ```
   
   **Expected behavior**
   
   The above test passes.
   
   **Additional context**
   
   This of course assumes that the whole intention of schema merging is to 
merge record batches of merged schemas, so if that is a wrong assumption this 
can be solved other ways for us by creating our own schema merge functionality, 
however, I see no useful reason to merge schemas and not also merge record 
batches. And if that's the intention then I think this is a legitimate bug.
   
   @alamb @vegarsti


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to