renato2099 commented on issue #1305:
URL: 
https://github.com/apache/datafusion-python/issues/1305#issuecomment-3623102011

   I tried this in directly on datafusion, and it works as expected ... which 
is good 😅 
   ```
       let a: ArrayRef = Arc::new(StringArray::from(vec!["a", "b", "c", "d"]));
       let b: ArrayRef = Arc::new(Int32Array::from(vec![1, 10, 10, 100]));
       let batch = RecordBatch::try_from_iter(vec![("a", a), ("b", b)])?;
   
       let bb: ArrayRef = Arc::new(Int32Array::from(vec![2, 20, 30, 40]));
       let batch2 = RecordBatch::try_from_iter(vec![("b", bb)])?;
   
       ctx.register_batch("t", batch)?;
       let df = ctx.table("t").await?;
   
       ctx.register_batch("t2", batch2)?;
       let df2 = ctx.table("t2").await?;
   
       df.join(df2, JoinType::Full, &["b"], &["b"], None)?.show().await?;
   ```
   results
   ```
   +---+-----+----+
   | a | b   | b  |
   +---+-----+----+
   | a | 1   |    |
   | b | 10  |    |
   | c | 10  |    |
   | d | 100 |    |
   |   |     | 2  |
   |   |     | 20 |
   |   |     | 30 |
   |   |     | 40 |
   +---+-----+----+
   ```
   
   also following your example, I see that all rows are there, but the values 
from the left side are dropped
   ```
   >>> merged.to_pandas()                                                       
                                                                                
                             19:51:47 [50/903]
       log_time key_frame
   0        NaN      True
   1        NaN      True
   2        NaN      True
   3        NaN      True
   4        NaN      True
   5        NaN      True
   6        NaN      True
   7        NaN      True
   8        NaN      True
   9        NaN      True
   10       2.0      None
   11       4.0      None
   12       6.0      None
   13       8.0      None
   14      10.0      None
   ```
   and when using the `empty` column we get the following
   ```
   >>> merged2.to_pandas()
       log_time  empty key_frame
   0        2.0    0.0      None
   1        4.0    0.0      None
   2        6.0    0.0      None
   3        8.0    0.0      None
   4       10.0    0.0      None
   5        NaN    NaN      True
   6        NaN    NaN      True
   7        NaN    NaN      True
   8        NaN    NaN      True
   9        NaN    NaN      True
   10       NaN    NaN      True
   11       NaN    NaN      True
   12       NaN    NaN      True
   13       NaN    NaN      True
   14       NaN    NaN      True
   ```
   I am thinking that the left `log_time` column values are just getting dropped
   I will take a closer look in the datafusion-python side :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to