Re: Spark Job Performance monitoring approaches

2017-02-15 Thread Chetan Khatri
Thank you Georg On Thu, Feb 16, 2017 at 12:30 PM, Georg Heiler wrote: > I know of the following tools > https://sites.google.com/site/sparkbigdebug/home https:// > engineering.linkedin.com/blog/2016/04/dr-elephant-open- >

Re: physical memory usage keep increasing for spark app on Yarn

2017-02-15 Thread Yang Cao
Hi Pavel! Sorry for late. I just do some investigation in these days with my colleague. Here is my thought: from spark 1.2, we use Netty with off-heap memory to reduce GC during shuffle and cache block transfer. In my case, if I try to increase the memory overhead enough. I will get the Max

Re: Spark Job Performance monitoring approaches

2017-02-15 Thread Georg Heiler
I know of the following tools https://sites.google.com/site/sparkbigdebug/home https://engineering.linkedin.com/blog/2016/04/dr-elephant-open-source-self-serve-performance-tuning-hadoop-spark https://github.com/SparkMonitor/varOne https://github.com/groupon/sparklint Chetan Khatri

Spark Job Performance monitoring approaches

2017-02-15 Thread Chetan Khatri
Hello All, What would be the best approches to monitor Spark Performance, is there any tools for Spark Job Performance monitoring ? Thanks.

Remove .HiveStaging files

2017-02-15 Thread KhajaAsmath Mohammed
Hi, I am using spark temporary tables to write data back to hive. I have seen weird behavior of .hive-staging files after job completion. does anyone know how to delete them or dont get created while writing data into hive. Thanks, Asmath

[Spark Streaming WAL] custom java streaming receiver and the WAL

2017-02-15 Thread Charles O. Bajomo
Hello all, I am having some problems with my custom java based receiver. I am running Spark 1.5.0 and I used the template on the spark website (http://spark.apache.org/docs/1.0.0/streaming-custom-receivers.html). Basically my receiver listens to a JMS queue (Solace) and then based on the size

Re: Enrichment with static tables

2017-02-15 Thread Sam Elamin
You can do a join or a union to combine all the dataframes to one fat dataframe or do a select on the columns you want to produce your transformed dataframe Not sure if I understand the question though, If the goal is just an end state transformed dataframe that can easily be done Regards Sam

Enrichment with static tables

2017-02-15 Thread Gaurav Agarwal
Hello We want to enrich our spark RDD loaded with multiple Columns and multiple Rows . This need to be enriched with 3 different tables that i loaded 3 different spark dataframe . Can we write some logic in spark so i can enrich my spark RDD with different stattic tables. Thanks

Re: notebook connecting Spark On Yarn

2017-02-15 Thread Jon Gregg
Could you just make Hadoop's resource manager (port 8088) available to your users, and they can check available containers that way if they see the launch is stalling? Another option is to reduce the default # of executors and memory per executor in the launch script to some small fraction of

Regarding transformation with dataframe

2017-02-15 Thread Gaurav Agarwal
Hello I have loaded 3 dataframes with 3 different Static tables. Now i got the csv file and with the help of Spark i loaded the csv into dataframe and named it as temporary table as "Employee". Now i need to enrich the columns in the Employee DF and query any of 3 static table respectively with

Re: How to specify default value for StructField?

2017-02-15 Thread Yong Zhang
If it works under hive, do you try just create the DF from Hive table directly in Spark? That should work, right? Yong From: Begar, Veena Sent: Wednesday, February 15, 2017 10:16 AM To: Yong Zhang; smartzjp; user@spark.apache.org Subject:

Latest Release of Receiver based Kafka Consumer for Spark Streaming.

2017-02-15 Thread Dibyendu Bhattacharya
Hi , Released latest version of Receiver based Kafka Consumer for Spark Streaming . Available at Spark Packages : https://spark-packages.org/package/dibbhatt/ kafka-spark-consumer Also at github : https://github.com/dibbhatt/kafka-spark-consumer Some key features - Tuned for better

RE: How to specify default value for StructField?

2017-02-15 Thread Begar, Veena
Thanks Yong. I know about merging the schema option. Using Hive we can read AVRO files having different schemas. And also we can do the same in Spark also. Similarly we can read ORC files having different schemas in Hive. But, we can’t do the same in Spark using dataframe. How we can do it

Query data in subdirectories in Hive Partitions using Spark SQL

2017-02-15 Thread Ahmed Kamal Abdelfatah
Hi folks, How can I force spark sql to recursively get data stored in parquet format from subdirectories ? In Hive, I could achieve this by setting few Hive configs. set hive.input.dir.recursive=true; set hive.mapred.supports.subdirectories=true; set hive.supports.subdirectories=true; set

notebook connecting Spark On Yarn

2017-02-15 Thread Sachin Aggarwal
Hi, I am trying to create multiple notebooks connecting to spark on yarn. After starting few jobs my cluster went out of containers. All new notebook request are in busy state as Jupyter kernel gateway is not getting any containers for master to be started. Some job are not leaving the

Re: extracting eventlogs saved snappy format.

2017-02-15 Thread Jörn Franke
What do you want to do with the event log ? The Hadoop command line can show compressed files (hadoop fs -text). Alternatively there are tools depending on your os ... you can also write a small job to do this and run it on the cluster. > On 15 Feb 2017, at 10:55, satishl

extracting eventlogs saved snappy format.

2017-02-15 Thread satishl
what is the right way to unzip an Spark app eventlog saved in snappy format (.snappy) Are there any libraries which we can use to do this programmatically? -- View this message in context:

What is the practical use of "Peak Execution Memory" in Spark App Resource tuning

2017-02-15 Thread satishl
Question is in the title. Can the metric "Peak Execution memory" be used for spark app resource tuning, if yes how? if no, what purpose does it serve during debugging Apps. -- View this message in context:

Spark executor memory and jvm heap memory usage metric

2017-02-15 Thread satishl
We have been measuring jvm heap memory usage in our spark app, by taking periodic sampling of jvm heap memory usage and saving it in our metrics db. we do this by spawning a thread in the spark app and measuring the jvm heap memory usage every 1 min. Is it a fair assumption to conclude that if the

Re: How to get a spark sql statement implement duration ?

2017-02-15 Thread ??????????
you can find the duration time by wen ui,such as http://xxx:8080 .It depends on your setting. anout the shell, i do not know how to check the time ---Original--- From: "Jacek Laskowski" Date: 2017/2/8 04:14:58 To: "Mars Xu"; Cc:

Re: using spark-xml_2.10 to extract data from XML file

2017-02-15 Thread Carlo . Allocca
Hi Hyukjin, Thank you very much for this. Sure I am going to do it today based on data + java code. Many Thanks for the support. Best Regards, Carlo On 15 Feb 2017, at 00:22, Hyukjin Kwon > wrote: Hi Carlo, There was a bug in lower versions

Re: is dataframe thread safe?

2017-02-15 Thread vincent gromakowski
I would like to have your opinion about an idea I had... I am thinking of answering the issue of interactive query on small/medium dataset (max 500 GB or 1 TB) with a solution based on the thriftserver and spark cache management. Currently the problem of caching the dataset in Spark is that you

Re: is dataframe thread safe?

2017-02-15 Thread ??????????
updating dataframe returns NEW dataframe like RDD please? ---Original--- From: "vincent gromakowski" Date: 2017/2/14 01:15:35 To: "Reynold Xin"; Cc: "user";"Mendelson, Assaf"; Subject: Re: is