I know recently outer join was changed to preserve actual nulls through the join in https://github.com/apache/spark/pull/13425. I am seeing what seems like inconsistent behavior though based on how the join is interacted with. In one case the default datatype values are still used instead of nulls whereas the other case passes the nulls through. I have a small databricks notebook showing the case against 2.0 preview:
https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/160347920874755/4268263383756277/673639177603143/latest.html -- *Richard Marscher* Senior Software Engineer Localytics Localytics.com <http://localytics.com/> | Our Blog <http://localytics.com/blog> | Twitter <http://twitter.com/localytics> | Facebook <http://facebook.com/localytics> | LinkedIn <http://www.linkedin.com/company/1148792?trk=tyah>