Mildly off-topic: >From a *correctness* perspective only, it seems Spark can read bucketed Hive tables just fine. I am ignoring the fact that Spark doesn't take advantage of the bucketing.
Is that a fair assessment? Or is it more complicated than that? Also, Spark has code to prevent an application from accidentally writing to a bucketed Hive table (except it as a hole <https://issues.apache.org/jira/browse/SPARK-27498>). Except for that hole, the write case is covered. Spark apps reading bucketed Hive tables seems to be common, so I hope it works (as it seems to). On Thu, Mar 7, 2019 at 12:58 PM <tcon...@gmail.com> wrote: > Thanks Ryan and Reynold for the information! > > > > Cheers, > > Tyson > > > > *From:* Ryan Blue <rb...@netflix.com> > *Sent:* Wednesday, March 6, 2019 3:47 PM > *To:* Reynold Xin <r...@databricks.com> > *Cc:* tcon...@gmail.com; Spark Dev List <dev@spark.apache.org> > *Subject:* Re: Hive Hash in Spark > > > > I think this was needed to add support for bucketed Hive tables. Like > Tyson noted, if the other side of a join can be bucketed the same way, then > Spark can use a bucketed join. I have long-term plans to support this in > the DataSourceV2 API, but I don't think we are very close to implementing > it yet. > > > > rb > > > > On Wed, Mar 6, 2019 at 1:57 PM Reynold Xin <r...@databricks.com> wrote: > > I think they might be used in bucketing? Not 100% sure. > > > > > > On Wed, Mar 06, 2019 at 1:40 PM, <tcon...@gmail.com> wrote: > > Hi, > > > > I noticed the existence of a Hive Hash partitioning implementation in > Spark, but also noticed that it’s not being used, and that the Spark hash > partitioning function is presently hardcoded to Murmur3. My question is > whether Hive Hash is dead code or are their future plans to support reading > and understanding data the has been partitioned using Hive Hash? By > understanding, I mean that I’m able to avoid a full shuffle join on Table A > (partitioned by Hive Hash) when joining with a Table B that I can shuffle > via Hive Hash to Table A. > > > > Thank you, > > Tyson > > > > > > > -- > > Ryan Blue > > Software Engineer > > Netflix >