Recall that the input isn't actually read until to do something that forces
evaluation, like call saveAsTextFile. You didn't show the whole stack trace
here, but it probably occurred while parsing an input line where one of
your long fields is actually an empty string.

Because this is such a common problem, I usually define a "parse" method
that converts input text to the desired schema. It catches parse exceptions
like this and reports the bad line at least. If you can return a default
long in this case, say 0, that makes it easier to return something.

dean



Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com

On Wed, Mar 25, 2015 at 11:48 AM, BASAK, ANANDA <ab9...@att.com> wrote:

>  Thanks. This library is only available with Spark 1.3. I am using
> version 1.2.1. Before I upgrade to 1.3, I want to try what can be done in
> 1.2.1.
>
>
>
> So I am using following:
>
> val MyDataset = sqlContext.sql("my select query”)
>
>
>
> MyDataset.map(t =>
> t(0)+"|"+t(1)+"|"+t(2)+"|"+t(3)+"|"+t(4)+"|"+t(5)).saveAsTextFile("/my_destination_path")
>
>
>
> But it is giving following error:
>
> 15/03/24 17:05:51 ERROR Executor: Exception in task 1.0 in stage 13.0 (TID
> 106)
>
> java.lang.NumberFormatException: For input string: ""
>
>         at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>
>         at java.lang.Long.parseLong(Long.java:453)
>
>         at java.lang.Long.parseLong(Long.java:483)
>
>         at
> scala.collection.immutable.StringLike$class.toLong(StringLike.scala:230)
>
>
>
> is there something wrong with the TSTAMP field which is Long datatype?
>
>
>
> Thanks & Regards
>
> -----------------------
>
> Ananda Basak
>
> Ph: 425-213-7092
>
>
>
> *From:* Yin Huai [mailto:yh...@databricks.com]
> *Sent:* Monday, March 23, 2015 8:55 PM
>
> *To:* BASAK, ANANDA
> *Cc:* user@spark.apache.org
> *Subject:* Re: Date and decimal datatype not working
>
>
>
> To store to csv file, you can use Spark-CSV
> <https://github.com/databricks/spark-csv> library.
>
>
>
> On Mon, Mar 23, 2015 at 5:35 PM, BASAK, ANANDA <ab9...@att.com> wrote:
>
>  Thanks. This worked well as per your suggestions. I had to run following:
>
> val TABLE_A =
> sc.textFile("/Myhome/SPARK/files/table_a_file.txt").map(_.split("|")).map(p
> => ROW_A(p(0).trim.toLong, p(1), p(2).trim.toInt, p(3), BigDecimal(p(4)),
> BigDecimal(p(5)), BigDecimal(p(6))))
>
>
>
> Now I am stuck at another step. I have run a SQL query, where I am
> Selecting from all the fields with some where clause , TSTAMP filtered with
> date range and order by TSTAMP clause. That is running fine.
>
>
>
> Then I am trying to store the output in a CSV file. I am using
> saveAsTextFile(“filename”) function. But it is giving error. Can you please
> help me to write a proper syntax to store output in a CSV file?
>
>
>
>
>
> Thanks & Regards
>
> -----------------------
>
> Ananda Basak
>
> Ph: 425-213-7092
>
>
>
> *From:* BASAK, ANANDA
> *Sent:* Tuesday, March 17, 2015 3:08 PM
> *To:* Yin Huai
> *Cc:* user@spark.apache.org
> *Subject:* RE: Date and decimal datatype not working
>
>
>
> Ok, thanks for the suggestions. Let me try and will confirm all.
>
>
>
> Regards
>
> Ananda
>
>
>
> *From:* Yin Huai [mailto:yh...@databricks.com]
> *Sent:* Tuesday, March 17, 2015 3:04 PM
> *To:* BASAK, ANANDA
> *Cc:* user@spark.apache.org
> *Subject:* Re: Date and decimal datatype not working
>
>
>
> p(0) is a String. So, you need to explicitly convert it to a Long. e.g.
> p(0).trim.toLong. You also need to do it for p(2). For those BigDecimals
> value, you need to create BigDecimal objects from your String values.
>
>
>
> On Tue, Mar 17, 2015 at 5:55 PM, BASAK, ANANDA <ab9...@att.com> wrote:
>
>   Hi All,
>
> I am very new in Spark world. Just started some test coding from last
> week. I am using spark-1.2.1-bin-hadoop2.4 and scala coding.
>
> I am having issues while using Date and decimal data types. Following is
> my code that I am simply running on scala prompt. I am trying to define a
> table and point that to my flat file containing raw data (pipe delimited
> format). Once that is done, I will run some SQL queries and put the output
> data in to another flat file with pipe delimited format.
>
>
>
> *******************************************************
>
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>
> import sqlContext.createSchemaRDD
>
>
>
>
>
> // Define row and table
>
> case class ROW_A(
>
>   TSTAMP:           Long,
>
>   USIDAN:             String,
>
>   SECNT:                Int,
>
>   SECT:                   String,
>
>   BLOCK_NUM:        BigDecimal,
>
>   BLOCK_DEN:        BigDecimal,
>
>   BLOCK_PCT:        BigDecimal)
>
>
>
> val TABLE_A =
> sc.textFile("/Myhome/SPARK/files/table_a_file.txt").map(_.split("|")).map(p
> => ROW_A(p(0), p(1), p(2), p(3), p(4), p(5), p(6)))
>
>
>
> TABLE_A.registerTempTable("TABLE_A")
>
>
>
> ***************************************************
>
>
>
> The second last command is giving error, like following:
>
> <console>:17: error: type mismatch;
>
> found   : String
>
> required: Long
>
>
>
> Looks like the content from my flat file are considered as String always
> and not as Date or decimal. How can I make Spark to take them as Date or
> decimal types?
>
>
>
> Regards
>
> Ananda
>
>
>
>
>

Reply via email to