Re: issue on define a dataframe

2021-12-14 Thread Sean Owen
Is this python? Just try passing [("apple",), ("orange",), ...] On Tue, Dec 14, 2021 at 7:18 PM wrote: > Hello, > > Spark newbie here :) > > Why I can't create the dataframe with just one column? > > for instance, this works: > > >>>

issue on define a dataframe

2021-12-14 Thread bitfox
Hello, Spark newbie here :) Why I can't create the dataframe with just one column? for instance, this works: df=spark.createDataFrame([("apple",2),("orange",3)],["name","count"]) But this can't work: df=spark.createDataFrame([("apple"),("orange")],["name"]) Traceback (most recent call

Re: spark thrift server as hive on spark running on kubernetes, and more.

2021-12-14 Thread Frank Hwa
what's the difference between DataRoaster and Dask? https://scalingpythonml.com/2020/11/03/a-first-look-at-dask-on-arm-on-k8s.html Thanks. On 2021/12/15 8:42, Kidong Lee wrote: Recently I have written a spark operator to deploy spark applications onto Kubernetes using custom resources. See

Re: spark thrift server as hive on spark running on kubernetes, and more.

2021-12-14 Thread Kidong Lee
Hi all, Recently I have written a spark operator to deploy spark applications onto Kubernetes using custom resources. See DataRoaster spark operator for more details: https://github.com/cloudcheflabs/dataroaster/tree/master/operators/spark Spark thrift server can be more easier deployed on

spark.read.schema return null for dataframe column values

2021-12-14 Thread Mohamed Samir
Hi, I have small question and issue which I hope spark gurus to help me in I have parquet file person.parquet that has multiple column with one row. one of the column "Middle Name" has space which cause issue with spark when writing it to parquet format [image: image.png] what i have done is to

Re: question about data skew and memory issues

2021-12-14 Thread Mich Talebzadeh
Hi david, Can you give us the example of code you are running and the way you are aggregating over keys? HTH view my Linkedin profile *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or

question about data skew and memory issues

2021-12-14 Thread David Diebold
Hello all, I was wondering if it possible to encounter out of memory exceptions on spark executors when doing some aggregation, when a dataset is skewed. Let's say we have a dataset with two columns: - key : int - value : float And I want to aggregate values by key. Let's say that we have a tons

Re: Log4j 1.2.17 spark CVE

2021-12-14 Thread Sean Owen
FWIW here is the Databricks statement on it. Not the same as Spark but includes Spark of course. https://databricks.com/blog/2021/12/13/log4j2-vulnerability-cve-2021-44228-research-and-assessment.html Yes the question is almost surely more whether user apps are affected, not Spark itself. On

Re: Log4j 1.2.17 spark CVE

2021-12-14 Thread Steve Loughran
log4j 1.2.17 is not vulnerable. There is an existing CVE there from a log aggregation servlet; Cloudera products ship a patched release with that servlet stripped...asf projects are not allowed to do that. But: some recent Cloudera Products do include log4j 2.x, so colleagues of mine are busy