[
https://issues.apache.org/jira/browse/HUDI-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan updated HUDI-651:
-------------------------------------
Fix Version/s: 0.11.0
(was: 0.10.0)
> Incremental Query on Hive via Spark SQL does not return expected results
> ------------------------------------------------------------------------
>
> Key: HUDI-651
> URL: https://issues.apache.org/jira/browse/HUDI-651
> Project: Apache Hudi
> Issue Type: Bug
> Components: Spark Integration
> Reporter: Vinoth Chandar
> Assignee: sivabalan narayanan
> Priority: Critical
> Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.11.0
>
>
> Using the docker demo, I added two delta commits to a MOR table and was a
> hoping to incremental consume them like Hive QL.. Something amiss
> {code}
> scala>
> spark.sparkContext.hadoopConfiguration.set("hoodie.stock_ticks_mor_rt.consume.start.timestamp","20200302210147")
> scala>
> spark.sparkContext.hadoopConfiguration.set("hoodie.stock_ticks_mor_rt.consume.mode","INCREMENTAL")
> scala> spark.sql("select distinct `_hoodie_commit_time` from
> stock_ticks_mor_rt").show(100, false)
> +-------------------+
> |_hoodie_commit_time|
> +-------------------+
> |20200302210010 |
> |20200302210147 |
> +-------------------+
> scala> sc.setLogLevel("INFO")
> scala> spark.sql("select distinct `_hoodie_commit_time` from
> stock_ticks_mor_rt").show(100, false)
> 20/03/02 21:15:37 INFO aggregate.HashAggregateExec:
> spark.sql.codegen.aggregate.map.twolevel.enabled is set to true, but current
> version of codegened fast hashmap does not support this aggregate.
> 20/03/02 21:15:37 INFO aggregate.HashAggregateExec:
> spark.sql.codegen.aggregate.map.twolevel.enabled is set to true, but current
> version of codegened fast hashmap does not support this aggregate.
> 20/03/02 21:15:37 INFO memory.MemoryStore: Block broadcast_44 stored as
> values in memory (estimated size 292.3 KB, free 365.3 MB)
> 20/03/02 21:15:37 INFO memory.MemoryStore: Block broadcast_44_piece0 stored
> as bytes in memory (estimated size 25.4 KB, free 365.3 MB)
> 20/03/02 21:15:37 INFO storage.BlockManagerInfo: Added broadcast_44_piece0 in
> memory on adhoc-1:45623 (size: 25.4 KB, free: 366.2 MB)
> 20/03/02 21:15:37 INFO spark.SparkContext: Created broadcast 44 from
> 20/03/02 21:15:37 INFO hadoop.HoodieParquetInputFormat: Reading hoodie
> metadata from path hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor
> 20/03/02 21:15:37 INFO table.HoodieTableMetaClient: Loading
> HoodieTableMetaClient from
> hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor
> 20/03/02 21:15:37 INFO util.FSUtils: Hadoop Configuration: fs.defaultFS:
> [hdfs://namenode:8020], Config:[Configuration: core-default.xml,
> core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml,
> yarn-site.xml, hdfs-default.xml, hdfs-site.xml,
> org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@5a66fc27,
> file:/etc/hadoop/hive-site.xml], FileSystem:
> [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-1645984031_1, ugi=root
> (auth:SIMPLE)]]]
> 20/03/02 21:15:37 INFO table.HoodieTableConfig: Loading table properties from
> hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor/.hoodie/hoodie.properties
> 20/03/02 21:15:37 INFO table.HoodieTableMetaClient: Finished Loading Table of
> type MERGE_ON_READ(version=1) from
> hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor
> 20/03/02 21:15:37 INFO mapred.FileInputFormat: Total input paths to process :
> 1
> 20/03/02 21:15:37 INFO hadoop.HoodieParquetInputFormat: Found a total of 1
> groups
> 20/03/02 21:15:37 INFO timeline.HoodieActiveTimeline: Loaded instants
> [[20200302210010__clean__COMPLETED],
> [20200302210010__deltacommit__COMPLETED], [20200302210147__clean__COMPLETED],
> [20200302210147__deltacommit__COMPLETED]]
> 20/03/02 21:15:37 INFO view.HoodieTableFileSystemView: Adding file-groups for
> partition :2018/08/31, #FileGroups=1
> 20/03/02 21:15:37 INFO view.AbstractTableFileSystemView: addFilesToView:
> NumFiles=1, FileGroupsCreationTime=0, StoreTimeTaken=0
> 20/03/02 21:15:37 INFO hadoop.HoodieParquetInputFormat: Total paths to
> process after hoodie filter 1
> 20/03/02 21:15:37 INFO hadoop.HoodieParquetInputFormat: Reading hoodie
> metadata from path hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor
> 20/03/02 21:15:37 INFO table.HoodieTableMetaClient: Loading
> HoodieTableMetaClient from
> hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor
> 20/03/02 21:15:37 INFO util.FSUtils: Hadoop Configuration: fs.defaultFS:
> [hdfs://namenode:8020], Config:[Configuration: core-default.xml,
> core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml,
> yarn-site.xml, hdfs-default.xml, hdfs-site.xml,
> org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@5a66fc27,
> file:/etc/hadoop/hive-site.xml], FileSystem:
> [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-1645984031_1, ugi=root
> (auth:SIMPLE)]]]
> 20/03/02 21:15:37 INFO table.HoodieTableConfig: Loading table properties from
> hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor/.hoodie/hoodie.properties
> 20/03/02 21:15:37 INFO table.HoodieTableMetaClient: Finished Loading Table of
> type MERGE_ON_READ(version=1) from
> hdfs://namenode:8020/user/hive/warehouse/stock_ticks_mor
> 20/03/02 21:15:37 INFO timeline.HoodieActiveTimeline: Loaded instants
> [[20200302210010__clean__COMPLETED],
> [20200302210010__deltacommit__COMPLETED], [20200302210147__clean__COMPLETED],
> [20200302210147__deltacommit__COMPLETED]]
> 20/03/02 21:15:37 INFO view.AbstractTableFileSystemView: Building file system
> view for partition (2018/08/31)
> 20/03/02 21:15:37 INFO view.AbstractTableFileSystemView: #files found in
> partition (2018/08/31) =3, Time taken =1
> 20/03/02 21:15:37 INFO view.HoodieTableFileSystemView: Adding file-groups for
> partition :2018/08/31, #FileGroups=1
> 20/03/02 21:15:37 INFO view.AbstractTableFileSystemView: addFilesToView:
> NumFiles=3, FileGroupsCreationTime=0, StoreTimeTaken=0
> 20/03/02 21:15:37 INFO view.AbstractTableFileSystemView: Time to load
> partition (2018/08/31) =2
> 20/03/02 21:15:37 INFO realtime.HoodieParquetRealtimeInputFormat: Returning a
> total splits of 1
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)