I want to compute cume_dist on a bunch of columns in a spark dataframe, but
want to remove NULL values before doing so.
I have this loop in pyspark. While this works, I see the driver runs at
100% while the executors are idle for the most part. I am reading that
running a loop is an anti-pattern
Hi,
I am a bit confused here, it is not entirely clear to me why are you
creating the row numbers, and how creating the row numbers helps you with
the joins?
Can you please explain with some sample data?
Regards,
Gourav
On Fri, Jan 7, 2022 at 1:14 AM Andrew Davidson
wrote:
> Hi
>
>
>
> I am
Hi,
As always, before answering the question, can I please ask what are you
trying to achieve by storing the data in a table? How are you planning to
query a binary data?
If you look at any relational theory, then it states that a table is a
relation/ entity and the fields the attributes. You
Hi Spark Team
When creating a database via Spark 3.0 on Hive
1) spark.sql("create database test location '/user/hive'"). It creates the
database location on hdfs . As expected
2) When running the same command on 3.1 the database is created on the
local file system by default. I have to prefix