date:20180708

Re: Automatic Json Schema inference using Structured Streaming

2018-07-08 Thread chandan prakash

Hi Swetha, I also had the same requirement reading from json from kafka and writing back to parquet format. I did a work around : 1. Inferred the schema using the batch api by reading first few rows 2. started streaming using the inferred schema in step1 *Limitation*: Will not work if you s

Re: How to branch a Stream / have multiple Sinks / do multiple Queries on one Stream

2018-07-08 Thread chandan prakash

Thanks Amiya/TD for responding. @TD, Thanks for letting us know about this new foreachBatch api, this handle of per batch dataframe should be useful in many cases. @Amiya, The input source will be read twice, entire dag computation will be done twice. Not limitation but resource utilisation and p

Re: Number of records per micro-batch in DStream vs Structured Streaming

2018-07-08 Thread subramgr

Any one? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: How to avoid duplicate column names after join with multiple conditions

2018-07-08 Thread Vamshi Talla

Nirav, Spark does not create a duplicate column when you use the below join expression, as an array of column(s) like below but that requires the column name to be same in both the data frames. Example: df1.join(df2, [‘a’]) Thanks. Vamshi Talla On Jul 6, 2018, at 4:47 PM, Gokula Krishnan D

Re: repartition

2018-07-08 Thread Vamshi Talla

Hi Ravi, RDDs are always immutable, so you cannot change them, instead you create new ones by transforming one. Repartition is a transformation, so it lazily evaluated, hence computed only when you call an action on it. Thanks. Vamshi Talla On Jul 8, 2018, at 12:26 PM, mailto:ryanda...@gmail.c

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-08 Thread yohann jardin

When you run on Yarn, you don’t even need to start a spark cluster (spark master and slaves). Yarn receives a job and then allocate resources for the application master and then its workers. Check the resources available in the node section of the resource manager UI (and is your node actually

repartition

2018-07-08 Thread ryandam.9

Hi, Can anyone clarify how repartition works please ? * I have a DataFrame df which has only one partition: // Returns 1 df.rdd.getNumPartitions * I repartitioned it by passing "3" and assigned it a new DataFrame newdf val newdf = df.repartition(3) * ne

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-08 Thread kant kodali

@yohann sorry I am assuming you meant application master if so I believe spark is the one that provides application master. Is there anyway to look for how much resources are being requested and how much yarn is allowed to provide? I would assume this is a common case if so I am not sure why these

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-08 Thread kant kodali

yarn.scheduler.capacity.maximum-am-resource-percent by default is set to 0.1 and I tried changing it to 1.0 and still no luck. same problem persists. The master here is yarn and I just trying to spawn spark-shell --master yarn --deploy-mode client and run a simple world count so I am not sure why i

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-08 Thread yohann jardin

Following the logs from the resource manager: 2018-07-08 07:23:23,382 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: maximum-am-resource-percent is insufficient to start a single application in queue, it is likely set too low. skipping enforcement to allow at l

Re: Create an Empty dataframe

2018-07-08 Thread रविशंकर नायर

>From Stackoverflow: from pyspark.sql.types import StructType from pyspark.sql.types import StructField from pyspark.sql.types import StringType sc = SparkContext(conf=SparkConf()) spark = SparkSession(sc) # Need to use SparkSession(sc) to createDataFrame schema = StructType([ StructFiel

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-08 Thread रविशंकर नायर

Are you able to run a simple Map Reduce job on yarn without any issues? If you have any issues: I had this problem on Mac. Use CSRUTIL in Mac, to disable it. Then add a softlink sudo ln –s /usr/bin/java/bin/java The new versions of Mac from EL Captain does not allow softlinks in /bin/java.

Re: Create an Empty dataframe

2018-07-08 Thread Shmuel Blitz

Hi Dimitris, Could you explain your use case in a bit more details? What you are asking for, if I understand you correctly, is not the advised way to go about. If you're running analytics and expect their output to be a Dataframe with the specified columns, then you should compose your queries i

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-08 Thread kant kodali

Hi, It's on local mac book pro machine that has 16GB RAM 512GB disk and 8 vCpu! I am not running any code since I can't even spawn spark-shell with yarn as master as described in my previous email. I just want to run simple word count using yarn as master. Thanks! Below is the resource manager l

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-08 Thread Marco Mistroni

You running on emr? You checked the emr logs? Was in similar situation where job was stuck in accepted and then it died..turned out to be an issue w. My code when running g with huge data.perhaps try to reduce gradually the load til it works and then start from there? Not a huge help but I followed

spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-08 Thread kant kodali

Hi All, I am trying to run a simple word count using YARN as a cluster manager. I am currently using Spark 2.3.1 and Apache hadoop 2.7.3. When I spawn spark-shell like below it gets stuck in ACCEPTED stated forever. ./bin/spark-shell --master yarn --deploy-mode client I set my log4j.propertie

Re: Automatic Json Schema inference using Structured Streaming

Re: How to branch a Stream / have multiple Sinks / do multiple Queries on one Stream

Re: Number of records per micro-batch in DStream vs Structured Streaming

Re: How to avoid duplicate column names after join with multiple conditions

Re: repartition

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

repartition

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

Re: Create an Empty dataframe

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

Re: Create an Empty dataframe

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

16 matches

Site Navigation

Mail list logo

Footer information