[ 
https://issues.apache.org/jira/browse/HUDI-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishith Agarwal reassigned HUDI-258:
------------------------------------

    Assignee: Nishith Agarwal

> Hive Query engine not supporting join queries between RT and RO tables
> ----------------------------------------------------------------------
>
>                 Key: HUDI-258
>                 URL: https://issues.apache.org/jira/browse/HUDI-258
>             Project: Apache Hudi (incubating)
>          Issue Type: Bug
>          Components: Hive Integration
>            Reporter: Balaji Varadarajan
>            Assignee: Nishith Agarwal
>            Priority: Major
>
> Description : 
> [https://github.com/apache/incubator-hudi/issues/789#issuecomment-512740619]
>  
> Root Cause: Hive is tracking getSplits calls by dataset basePath and does not 
> take INputFormatClass into account. Hence getSplits() is called only once. In 
> the case of RO and RT tables, they both have same dataset base-path but 
> differ in the InputFormatClass. Due to this, Hive join query is returning 
> weird results.
>  
> =============
> The result of the demo is very strange
> (Step 6(a))
>  
> {{ select `_hoodie_commit_time`, symbol, ts, volume, open, close  from 
> stock_ticks_mor_rt where  symbol = 'GOOG';
>  select `_hoodie_commit_time`, symbol, ts, volume, open, close  from 
> stock_ticks_mor where  symbol = 'GOOG';}}
> return as demo
> BUT!
>  
> {{select a.key,a.ts, b.ts from stock_ticks_mor a join stock_ticks_mor_rt b  
> on a.key=b.key where a.ts != b.ts
> ...
> +--------+-------+-------+--+
> | a.key  | a.ts  | b.ts  |
> +--------+-------+-------+--+
> +--------+-------+-------+--+}}
>  
> {{0: jdbc:hive2://hiveserver:10000> select a.key,a.ts,b.ts from 
> stock_ticks_mor_rt a join stock_ticks_mor b on a.key = b.key where a.key= 
> 'GOOG_2018-08-31 10';
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
> future versions. Consider using a different execution engine (i.e. spark, 
> tez) or using Hive 1.X releases.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Execution log at: 
> /tmp/root/root_20190718091316_ec40e8f2-be17-4450-bb75-8db9f4390041.log
> 2019-07-18 09:13:20 Starting to launch local task to process map join;  
> maximum memory = 477626368
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> 2019-07-18 09:13:21 Dump the side-table for tag: 0 with group count: 1 into 
> file: 
> file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable
> 2019-07-18 09:13:21 Uploaded 1 File to: 
> file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable
>  (317 bytes)
> 2019-07-18 09:13:21 End of local task; Time Taken: 1.688 sec.
> +---------------------+----------------------+----------------------+--+
> |        a.key        |         a.ts         |         b.ts         |
> +---------------------+----------------------+----------------------+--+
> | GOOG_2018-08-31 10  | 2018-08-31 10:29:00  | 2018-08-31 10:29:00  |
> +---------------------+----------------------+----------------------+--+
> 1 row selected (7.207 seconds)
> 0: jdbc:hive2://hiveserver:10000> select a.key,a.ts,b.ts from stock_ticks_mor 
> a join stock_ticks_mor_rt b on a.key = b.key where a.key= 'GOOG_2018-08-31 
> 10';
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
> future versions. Consider using a different execution engine (i.e. spark, 
> tez) or using Hive 1.X releases.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Execution log at: 
> /tmp/root/root_20190718091348_72a5fc30-fc04-41c1-b2e3-5f943e4d5c08.log
> 2019-07-18 09:13:51 Starting to launch local task to process map join;  
> maximum memory = 477626368
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> 2019-07-18 09:13:53 Dump the side-table for tag: 0 with group count: 1 into 
> file: 
> file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-48_027_3613368446029280476-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile60--.hashtable
> 2019-07-18 09:13:53 Uploaded 1 File to: 
> file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-48_027_3613368446029280476-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile60--.hashtable
>  (317 bytes)
> 2019-07-18 09:13:53 End of local task; Time Taken: 2.36 sec.
> +---------------------+----------------------+----------------------+--+
> |        a.key        |         a.ts         |         b.ts         |
> +---------------------+----------------------+----------------------+--+
> | GOOG_2018-08-31 10  | 2018-08-31 10:59:00  | 2018-08-31 10:59:00  |
> +---------------------+----------------------+----------------------+--+}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to