Re: Spark 2.0 Shell -csv package weirdness

2016-03-20 Thread Marco Mistroni
Hi I try tomorrow same settings as you to see if I can experience same issues. Will report back once done Thanks On 20 Mar 2016 3:50 pm, "Vincent Ohprecio" wrote: > Thanks Mich and Marco for your help. I have created a ticket to look into > it on dev channel. > Here is the

Re: Spark 2.0 Shell -csv package weirdness

2016-03-20 Thread Vincent Ohprecio
Thanks Mich and Marco for your help. I have created a ticket to look into it on dev channel. Here is the issue https://issues.apache.org/jira/browse/SPARK-14031 On Sun, Mar 20, 2016 at 2:57 AM, Mich Talebzadeh wrote: > Hi Vincent, > > I downloads the CSV file and did

Re: Spark 2.0 Shell -csv package weirdness

2016-03-20 Thread Mich Talebzadeh
Hi Vincent, I downloads the CSV file and did the test. Spark version 1.5.2 The full code as follows. Minor changes to delete yearAndCancelled.parquet and output.csv files if they are already created //$SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.11:1.3.0 val HiveContext =

Re: Spark 2.0 Shell -csv package weirdness

2016-03-19 Thread Mich Talebzadeh
Hi Vince, We had a similar case a while back. I tried two solutions in both Spark on Hive metastore and Hive on Spark engine. Hive version 2 Spark as Hive engine 1.3.1 Basically --1 Move .CSV data into HDFS: --2 Create an external table (all columns as string) --3 Create the ORC table

Re: Fwd: Spark 2.0 Shell -csv package weirdness

2016-03-19 Thread Marco Mistroni
Have u tried df.saveAsParquetFIle? I think that method is on df Api Hth Marco On 19 Mar 2016 7:18 pm, "Vincent Ohprecio" wrote: > > For some reason writing data from Spark shell to csv using the `csv > package` takes almost an hour to dump to disk. Am I going crazy or did I

Fwd: Spark 2.0 Shell -csv package weirdness

2016-03-19 Thread Vincent Ohprecio
For some reason writing data from Spark shell to csv using the `csv package` takes almost an hour to dump to disk. Am I going crazy or did I do this wrong? I tried writing to parquet first and its fast as normal. On my Macbook Pro 16g - 2.2 GHz Intel Core i7 -1TB the machine CPU's goes crazy and