Re: Error when loading json to spark

2016-12-31 Thread Miguel Morales
Looks like it's trying to treat that path as a folder, try omitting the file name and just use the folder path. On Sat, Dec 31, 2016 at 7:58 PM, Raymond Xie wrote: > Happy new year!!! > > I am trying to load a json file into spark, the json file is attached here. > > I

Re: Custom delimiter file load

2016-12-31 Thread Nicholas Hakobian
See the documentation for the options given to the csv function: http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameReader@csv(paths:String*):org.apache.spark.sql.DataFrame The options can be passed with the option/options functions to the DataFrameReader class

Re: From Hive to Spark, what is the default database/table

2016-12-31 Thread Mich Talebzadeh
flight201601 is the name of database or schema it is not a TABLE! In Hive you can do show databases to see the list of databases. By Default Hive has a default database called "default" out of box For example to see list of tables in database flight201601 do the following: use flight201601;

From Hive to Spark, what is the default database/table

2016-12-31 Thread Raymond Xie
Hello, It is indicated in https://spark.apache.org/docs/1.6.1/sql-programming-guide.html#dataframes when Running SQL Queries Programmatically you can do: from pyspark.sql import SQLContextsqlContext = SQLContext(sc)df = sqlContext.sql("SELECT * FROM table") However, it did not indicate what

Re: How to load a big csv to dataframe in Spark 1.6

2016-12-31 Thread Felix Cheung
Hmm this would seem unrelated? Does it work on the same box without the package? Do you have more of the error stack you can share? _ From: Raymond Xie > Sent: Saturday, December 31, 2016 8:09 AM Subject: Re: How to

Custom delimiter file load

2016-12-31 Thread A Shaikh
In Pyspark 2 loading file wtih any delimiter into Dataframe is pretty straightforward spark.read.csv(file, schema=, sep='|') Is there something similar in Spark 2 in Scala! spark.read.csv(path, sep='|')?

Re: How to load a big csv to dataframe in Spark 1.6

2016-12-31 Thread Raymond Xie
Hello Felix, I followed the instruction and ran the command: > $SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0 and I received the following error message: java.lang.RuntimeException: java.net.ConnectException: Call From xie1/ 192.168.112.150 to localhost:9000 failed

Re: [ML] Converting ml.DenseVector to mllib.Vector

2016-12-31 Thread Peyman Mohajerian
This may also help: http://spark.apache.org/docs/latest/ml-migration-guides.html On Sat, Dec 31, 2016 at 6:51 AM, Marco Mistroni wrote: > Hi. > you have a DataFrame.. there should be either a way to > - convert a DF to a Vector without doing a cast > - use a ML library

Re: [ML] Converting ml.DenseVector to mllib.Vector

2016-12-31 Thread Marco Mistroni
Hi. you have a DataFrame.. there should be either a way to - convert a DF to a Vector without doing a cast - use a ML library which relies to DataFrames only I can see that your code is still importing libraries from two different 'machine learning ' packages import

New runtime exception after switch to Spark 2.1.0

2016-12-31 Thread mhornbech
Hi We just tested a switch from Spark 2.0.2 to Spark 2.1.0 on our codebase. It compiles fine, but introduces the following runtime exception upon initialization of our Cassandra database. I can't find any clues in the release notes. Has anyone experienced this? Morten sbt.ForkMain$ForkError:

Re: [TorrentBroadcast] Pyspark Application terminated saying "Failed to get broadcast_1_ piece0 of broadcast_1 in Spark 2.0.0"

2016-12-31 Thread Palash Gupta
Hi Marco, Thanks! Please have my response: so you have a pyspark application running on spark 2.0Palash>> Yes You have python scripts dropping files on HDFSPalash>> Yes (it is not part of spark process, just independent python script) then you have two spark jobPalash>> Yes - 1 load expected