How to use a customPartttioner hashed by userId inside saveAsTable using a
dataframe?
On Mon, Feb 15, 2016 at 11:24 AM, swetha kasireddy <
swethakasire...@gmail.com> wrote:
> How about saving the dataframe as a table partitioned by userId? My User
> records have userId, number of sessions, visit
How about saving the dataframe as a table partitioned by userId? My User
records have userId, number of sessions, visit count etc as the columns and
it should be partitioned by userId. I will need to join the userTable saved
in the database as follows with an incoming session RDD. The session RDD
OK. would it only query for the records that I want in hive as per filter
or just load the entire table? My user table will have millions of records
and I do not want to cause OOM errors by loading the entire table in memory.
On Mon, Feb 15, 2016 at 12:51 AM, Mich Talebzadeh
Also worthwhile using temporary tables for the joint query.
I can join a Hive table with any other JDBC accessed table from any other
databases with DF and temporary tables
//
//Get the FACT table from Hive
//
var s = HiveContext.sql("SELECT AMOUNT_SOLD, TIME_ID, CHANNEL_ID FROM
Have you tried creating a DataFrame from the RDD and join with DataFrame
which corresponds to the hive table ?
On Sun, Feb 14, 2016 at 9:53 PM, SRK wrote:
> Hi,
>
> How to join an RDD with a hive table and retrieve only the records that I
> am
> interested. Suppose, I