sivabalan narayanan created HUDI-2250:
-----------------------------------------

             Summary: Bulk insert can work with tables w/o primary key
                 Key: HUDI-2250
                 URL: https://issues.apache.org/jira/browse/HUDI-2250
             Project: Apache Hudi
          Issue Type: Sub-task
            Reporter: sivabalan narayanan


we wanna support bulk insert for any table. Right now, we have a constraint 
that only tables w/ no primary key can be bulk_inserted. 

 

         > 

         > set hoodie.sql.bulk.insert.enable = true;

hoodie.sql.bulk.insert.enable true

Time taken: 2.019 seconds, Fetched 1 row(s)

spark-sql> set hoodie.datasource.write.row.writer.enable = true;

hoodie.datasource.write.row.writer.enable true

Time taken: 0.026 seconds, Fetched 1 row(s)

spark-sql> 

         > 

         > create table hudi_17Gb_ext1 using hudi location 
's3a://siva-test-bucket-june-16/hudi_testing/gh_arch_dump/hudi_5/' options ( 

         >   type = 'cow', 

         >   primaryKey = 'randomId', 

         >   preCombineField = 'date_col' 

         >  ) 

         > partitioned by (type) as select * from gh_17Gb_date_col;

21/07/29 04:26:15 ERROR SparkSQLDriver: Failed in [create table hudi_17Gb_ext1 
using hudi location 
's3a://siva-test-bucket-june-16/hudi_testing/gh_arch_dump/hudi_5/' options ( 

  type = 'cow', 

  primaryKey = 'randomId', 

  preCombineField = 'date_col' 

 ) 

partitioned by (type) as select * from gh_17Gb_date_col]

java.lang.IllegalArgumentException: Table with primaryKey can not use bulk 
insert.

 at 
org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand$.buildHoodieInsertConfig(InsertIntoHoodieTableCommand.scala:219)

 at 
org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand$.run(InsertIntoHoodieTableCommand.scala:78)

 at 
org.apache.spark.sql.hudi.command.CreateHoodieTableAsSelectCommand.run(CreateHoodieTableAsSelectCommand.scala:86)

 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)

 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)

 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:120)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to