subject:"Need Help in Spark Hive Data Processing"

Need Help in Spark Hive Data Processing

2016-01-06 Thread Balaraju.Kagidala Kagidala

Hi , I am new user to spark. I am trying to use Spark to process huge Hive data using Spark DataFrames. I have 5 node Spark cluster each with 30 GB memory. i am want to process hive table with 450GB data using DataFrames. To fetch single row from Hive table its taking 36 mins. Pls suggest me

Re: Need Help in Spark Hive Data Processing

2016-01-06 Thread Jeff Zhang

It depends on how you fetch the single row. Does your query complex ? On Thu, Jan 7, 2016 at 12:47 PM, Balaraju.Kagidala Kagidala < balaraju.kagid...@gmail.com> wrote: > Hi , > > I am new user to spark. I am trying to use Spark to process huge Hive > data using Spark DataFrames. > > > I have 5

Re: Need Help in Spark Hive Data Processing

2016-01-06 Thread Jörn Franke

You need the table in an efficient format, such as Orc or parquet. Have the table sorted appropriately (hint: most discriminating column in the where clause). Do not use SAN or virtualization for the slave nodes. Can you please post your query. I always recommend to avoid single updates where