[
https://issues.apache.org/jira/browse/KUDU-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Qutiba updated KUDU-1533:
-------------------------
Description:
Applying Upserting kuduRdd into existing Kudu table is not clear how to apply.
You mention in the documentation under "Kudu integration with Spark":
some possible operations to perform:
***********************************************
// then we can insert data into the kudu table
df.write.options(Map("kudu.master" -> "your.kudu.master.here","kudu.table"->
"your.kudu.table.here")).mode("append").kudu
// to update existing data change the mode to 'overwrite'
df.write.options(Map("kudu.master"-> "your.kudu.master.here","kudu.table"->
"your.kudu.table.here")).mode("overwrite").kudu
****************************************************************
But there is no possibility to perform:
kuduDataFrame.write.options(Map("kudu.master"-> Kudu_Master,"kudu.table"->
TargetTable)).mode("upsert").kudu
***************************************************************
the current solution which is quit slow is:
Call DataFrame.foreachpartition
- open the table
- create session
--For each row in this partition
--- create upsert operation
--- get row from the operation
--- add all fields and values to this row
--- perform this operation
----------------------------------
this solution is quit slow! so adding upsert mode to Dataframe for Kudu tables
could be better than open sessions and create operations as the previous
solution.
kuduDataFrame.write.options(Map("kudu.master"-> Kudu_Master,"kudu.table"->
TargetTable)).mode("upsert").kudu
was:
Applying Upserting kuduRdd into existing Kudu table is not clear how to apply.
You mention in the documentation under "Kudu integration with Spark":
some possible operations to perform:
***********************************************
// then we can insert data into the kudu table
df.write.options(Map("kudu.master"-> "your.kudu.master.here","kudu.table"->
"your.kudu.table.here")).mode("append").kudu
// to update existing data change the mode to 'overwrite'
df.write.options(Map("kudu.master"-> "your.kudu.master.here","kudu.table"->
"your.kudu.table.here")).mode("overwrite").kudu
****************************************************************
But there is no possibility to perform:
kuduDataFrame.write.options(Map("kudu.master"-> Kudu_Master,"kudu.table"->
TargetTable)).mode("upsert").kudu
***************************************************************
the current solution which is quit slow is:
Call DataFrame.foreachpartition
- open the table
- create session
--For each row in this partition
--- create upsert operation
--- get row from the operation
--- add all fields and values to this row
--- perform this operation
----------------------------------
this solution is quit slow! so adding upsert mode to Dataframe for Kudu tables
could be better than open sessions and create operations as the previous
solution.
kuduDataFrame.write.options(Map("kudu.master"-> Kudu_Master,"kudu.table"->
TargetTable)).mode("upsert").kudu
> Spark Kudu Rdd/Dataframe upsert
> --------------------------------
>
> Key: KUDU-1533
> URL: https://issues.apache.org/jira/browse/KUDU-1533
> Project: Kudu
> Issue Type: Bug
> Environment: Spark
> Reporter: Qutiba
>
> Applying Upserting kuduRdd into existing Kudu table is not clear how to apply.
> You mention in the documentation under "Kudu integration with Spark":
> some possible operations to perform:
> ***********************************************
> // then we can insert data into the kudu table
> df.write.options(Map("kudu.master" -> "your.kudu.master.here","kudu.table"->
> "your.kudu.table.here")).mode("append").kudu
> // to update existing data change the mode to 'overwrite'
> df.write.options(Map("kudu.master"-> "your.kudu.master.here","kudu.table"->
> "your.kudu.table.here")).mode("overwrite").kudu
> ****************************************************************
> But there is no possibility to perform:
> kuduDataFrame.write.options(Map("kudu.master"-> Kudu_Master,"kudu.table"->
> TargetTable)).mode("upsert").kudu
> ***************************************************************
> the current solution which is quit slow is:
> Call DataFrame.foreachpartition
> - open the table
> - create session
> --For each row in this partition
> --- create upsert operation
> --- get row from the operation
> --- add all fields and values to this row
> --- perform this operation
> ----------------------------------
> this solution is quit slow! so adding upsert mode to Dataframe for Kudu
> tables could be better than open sessions and create operations as the
> previous solution.
> kuduDataFrame.write.options(Map("kudu.master"-> Kudu_Master,"kudu.table"->
> TargetTable)).mode("upsert").kudu
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)