[jira] [Commented] (HUDI-2058) support incremental query for insert_overwrite_table/insert_overwrite operation on cow table

ASF GitHub Bot (Jira) Sat, 03 Jul 2021 08:16:07 -0700


    [ 
https://issues.apache.org/jira/browse/HUDI-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17374056#comment-17374056
 ]


ASF GitHub Bot commented on HUDI-2058:
--------------------------------------

hudi-bot edited a comment on pull request #3139:
URL: https://github.com/apache/hudi/pull/3139#issuecomment-866671675


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "29536de9ee87bbc207daefef84b80b23c52fca9d",
       "status" : "DELETED",
       "url" : 
"https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=376";,
       "triggerID" : "29536de9ee87bbc207daefef84b80b23c52fca9d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ebd7ae14e0fafb413abdd4561361b2fbbd398473",
       "status" : "DELETED",
       "url" : 
"https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=410";,
       "triggerID" : "ebd7ae14e0fafb413abdd4561361b2fbbd398473",
       "triggerType" : "PUSH"
     }, {
       "hash" : "42066935f5571aad75be3d27ba3321b6e61f4b22",
       "status" : "FAILURE",
       "url" : 
"https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=424";,
       "triggerID" : "42066935f5571aad75be3d27ba3321b6e61f4b22",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f189b69eb1d47720891a49f76c3e6edf3d9cf557",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f189b69eb1d47720891a49f76c3e6edf3d9cf557",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 42066935f5571aad75be3d27ba3321b6e61f4b22 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=424)
 
   * f189b69eb1d47720891a49f76c3e6edf3d9cf557 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> support incremental query for insert_overwrite_table/insert_overwrite 
> operation on cow table
> --------------------------------------------------------------------------------------------
>
>                 Key: HUDI-2058
>                 URL: https://issues.apache.org/jira/browse/HUDI-2058
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Incremental Pull
>    Affects Versions: 0.8.0
>         Environment: hadoop 3.1.1
> spark3.1.1
> hive 3.1.1
>            Reporter: tao meng
>            Assignee: tao meng
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>
>  when  incremental query contains multiple commit before and after 
> replacecommit, and the query result contains the data of the old file. 
> Notice: mor table is ok, only cow table has this problem.
>  
> when query incr_view for cow table, replacecommit is ignored which lead the 
> wrong result. 
>  
>  
> test step:
> step1:  create dataFrame
> val df = spark.range(0, 10).toDF("keyid")
>  .withColumn("col3", expr("keyid"))
>  .withColumn("age", lit(1))
>  .withColumn("p", lit(2))
>  
> step2:  insert df to a empty hoodie table
> df.write.format("hudi").
>  option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, 
> DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL).
>  option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "col3").
>  option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "keyid").
>  option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "").
>  option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, 
> "org.apache.hudi.keygen.NonpartitionedKeyGenerator").
>  option(DataSourceWriteOptions.OPERATION_OPT_KEY, "insert").
>  option("hoodie.insert.shuffle.parallelism", "4").
>  option(HoodieWriteConfig.TABLE_NAME, "hoodie_test")
>  .mode(SaveMode.Overwrite).save(basePath)
>  
> step3: do insert_overwrite
> df.write.format("hudi").
>  option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, 
> DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL).
>  option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "col3").
>  option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "keyid").
>  option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "").
>  option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, 
> "org.apache.hudi.keygen.NonpartitionedKeyGenerator").
>  option(DataSourceWriteOptions.OPERATION_OPT_KEY, "insert_overwrite_table").
>  option("hoodie.insert.shuffle.parallelism", "4").
>  option(HoodieWriteConfig.TABLE_NAME, "hoodie_test")
>  .mode(SaveMode.Append).save(basePath)
>  
> step4: query incrematal table 
> spark.read.format("hudi").option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY, 
> DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL)
>  .option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY, "0000")
>  .option(DataSourceReadOptions.END_INSTANTTIME_OPT_KEY, currentCommits(0))
>  .load(basePath).select("keyid").orderBy("keyid").show(100, false)
>  
> result:   the result contains old data
> +-----+
> |keyid|
> +-----+
> |0 |
> |0 |
> |1 |
> |1 |
> |2 |
> |2 |
> |3 |
> |3 |
> |4 |
> |4 |
> |5 |
> |5 |
> |6 |
> |6 |
> |7 |
> |7 |
> |8 |
> |8 |
> |9 |
> |9 |
> +-----+
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2058) support incremental query for insert_overwrite_table/insert_overwrite operation on cow table

Reply via email to