Hi ,
I am new user to spark. I am trying to use Spark to process huge Hive
data using Spark DataFrames.
I have 5 node Spark cluster each with 30 GB memory. i am want to process
hive table with 450GB data using DataFrames. To fetch single row from Hive
table its taking 36 mins. Pls suggest me
It depends on how you fetch the single row. Does your query complex ?
On Thu, Jan 7, 2016 at 12:47 PM, Balaraju.Kagidala Kagidala <
balaraju.kagid...@gmail.com> wrote:
> Hi ,
>
> I am new user to spark. I am trying to use Spark to process huge Hive
> data using Spark DataFrames.
>
>
> I have 5
You need the table in an efficient format, such as Orc or parquet. Have the
table sorted appropriately (hint: most discriminating column in the where
clause). Do not use SAN or virtualization for the slave nodes.
Can you please post your query.
I always recommend to avoid single updates where