Re: Spark 2.0.0 - Apply schema on few columns of dataset

Jacek Laskowski Fri, 05 Aug 2016 12:39:28 -0700

Hi Aseem,

Ah, so I can't help you in this area. I've never worked with Spark
using Java (and honestly don't want to if I don't have to).


Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Fri, Aug 5, 2016 at 8:06 PM, Aseem Bansal <asmbans...@gmail.com> wrote:
> Yes. This is what I am after. But I have to use the Java API. And using the
> Java API I was not able to get the .as() function working
>
> On Fri, Aug 5, 2016 at 7:09 PM, Jacek Laskowski <ja...@japila.pl> wrote:
>>
>> Hi,
>>
>> I don't understand where the issue is...
>>
>> ➜  spark git:(master) ✗ cat csv-logs/people-1.csv
>> name,city,country,age,alive
>> Jacek,Warszawa,Polska,42,true
>>
>> val df = spark.read.option("header", true).csv("csv-logs/people-1.csv")
>> val nameCityPairs = df.select('name, 'city).as[(String, String)]
>>
>> scala> nameCityPairs.printSchema
>> root
>>  |-- name: string (nullable = true)
>>  |-- city: string (nullable = true)
>>
>> Is this what you're after?
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Fri, Aug 5, 2016 at 2:06 PM, Aseem Bansal <asmbans...@gmail.com> wrote:
>> > I need to use few columns out of a csv. But as there is no option to
>> > read
>> > few columns out of csv so
>> >  1. I am reading the whole CSV using SparkSession.csv()
>> >  2.  selecting few of the columns using DataFrame.select()
>> >  3. applying schema using the .as() function of Dataset<Row>.  I tried
>> > to
>> > extent org.apache.spark.sql.Encoder as the input for as function
>> >
>> > But I am getting the following exception
>> >
>> > Exception in thread "main" java.lang.RuntimeException: Only expression
>> > encoders are supported today
>> >
>> > So my questions are -
>> > 1. Is it possible to read few columns instead of whole CSV? I cannot
>> > change
>> > the CSV as that is upstream data
>> > 2. How do I apply schema to few columns if I cannot write my encoder?
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Spark 2.0.0 - Apply schema on few columns of dataset

Reply via email to