Would be great to document. Probably best with examples. On Tue, May 8, 2018 at 6:13 AM Nicholas Chammas <nicholas.cham...@gmail.com> wrote:
> The documentation for DataFrame.join() > <https://spark.apache.org/docs/2.3.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.join> > lists all the join types we support: > > - inner > - cross > - outer > - full > - full_outer > - left > - left_outer > - right > - right_outer > - left_semi > - left_anti > > Some of these join types are also listed on the SQL Programming Guide > <http://spark.apache.org/docs/2.3.0/sql-programming-guide.html#supported-hive-features> > . > > Is it obvious to everyone what all these different join types are? For > example, I had never heard of a LEFT ANTI join until stumbling on it in the > PySpark docs. It’s quite handy! But I had to experiment with it a bit just > to understand what it does. > > I think it would be a good service to our users if we either documented > these join types ourselves clearly, or provided a link to an external > resource that documented them sufficiently. I’m happy to file a JIRA about > this and do the work itself. It would be great if the documentation could > be expressed as a series of simple doc tests, but brief prose describing > how each join works would still be valuable. > > Does this seem worthwhile to folks here? And does anyone want to offer > guidance on how best to provide this kind of documentation so that it’s > easy to find by users, regardless of the language they’re using? > > Nick > >