I think this was needed to add support for bucketed Hive tables. Like Tyson noted, if the other side of a join can be bucketed the same way, then Spark can use a bucketed join. I have long-term plans to support this in the DataSourceV2 API, but I don't think we are very close to implementing it yet.
rb On Wed, Mar 6, 2019 at 1:57 PM Reynold Xin <r...@databricks.com> wrote: > I think they might be used in bucketing? Not 100% sure. > > > On Wed, Mar 06, 2019 at 1:40 PM, <tcon...@gmail.com> wrote: > >> Hi, >> >> >> >> I noticed the existence of a Hive Hash partitioning implementation in >> Spark, but also noticed that it’s not being used, and that the Spark hash >> partitioning function is presently hardcoded to Murmur3. My question is >> whether Hive Hash is dead code or are their future plans to support reading >> and understanding data the has been partitioned using Hive Hash? By >> understanding, I mean that I’m able to avoid a full shuffle join on Table A >> (partitioned by Hive Hash) when joining with a Table B that I can shuffle >> via Hive Hash to Table A. >> >> >> >> Thank you, >> >> Tyson >> > > -- Ryan Blue Software Engineer Netflix