codejoyan opened a new issue #4433:
URL: https://github.com/apache/hudi/issues/4433
Spark - 2.4.7
Hive - 2.3.7
Hudi - 0.9.0
I am doing a point-in-time read a Hudi table by specifying begin and end
time in Hive. While I get the expected result in spark, I do not get the same
result in Hive.
1. In Hive it seems even after setting the property
`hoodie.table.start.timestamp`, it returns the snapshot view. Basically it
returns the latest value after hoodie.table.start.timestamp. If
`hoodie.table.start.timestamp` is the latest commit, it does not return any
rows.
2. The property `hoodie.table.end.timestamp` does not seem to exist. Can
you please confirm?
3. Any way to do point-in-time read in Hive?
**Spark**
```
scala> val startTime = "000"
scala> val endTime = "20211222134547"
scala> val df = spark.read.format("org.apache.hudi").
| option(VIEW_TYPE_OPT_KEY, VIEW_TYPE_INCREMENTAL_OPT_VAL).
| option(BEGIN_INSTANTTIME_OPT_KEY, startTime).
| option(END_INSTANTTIME_OPT_KEY, endTime).
| load(basepath)
scala> df.where("col1 = '0IN00080048626520210117001808'").select("col1",
"col2", "col3", "col4").show(false)
+---------+-----------------------------+------------+------------+
|col1 |col2 |col3 |col4 |
+---------+-----------------------------+------------+------------+
|80 |0IN00080048626520210117001808|1 |null |
+---------+-----------------------------+------------+------------+
scala> val startTime = "20211222134547"
scala> val endTime = "20211222170043"
scala> val df = spark.read.format("org.apache.hudi").
| option(VIEW_TYPE_OPT_KEY, VIEW_TYPE_INCREMENTAL_OPT_VAL).
| option(BEGIN_INSTANTTIME_OPT_KEY, startTime).
| option(END_INSTANTTIME_OPT_KEY, endTime).
| load(basepath)
scala> df.where("visit_nbr =
'0IN00080048626520210117001808'").select("col1", "col2", "col3",
"col4").show(false)
+---------+-----------------------------+------------+------------+
|col1 |col2 |col3 |col4 |
+---------+-----------------------------+------------+------------+
|80 |0IN00080048626520210117001808|1 |5.0 |
+---------+-----------------------------+------------+------------+
```
**Hive**
```
set hoodie.hudi_read_test_4.consume.mode=INCREMENTAL;
set hoodie.hudi_read_test_4.consume.max.commits=3;
set hoodie.hudi_read_test_4.consume.start.timestamp=000;
set hoodie.hudi_read_test_4.consume.end.timestamp=20211222134547;
0: jdbc:hive2://hudi-read-poc-m-0.c.wmt-bfdms> select col1,col2,col3,col4
from stg_db.hudi_read_test_4 where col1 = '0IN00080048626520210117001808';
+------------+--------------------------------+---------------+---------------+
| col1 | col2 | col3 | col4
|
+------------+--------------------------------+---------------+---------------+
| 80 | 0IN00080048626520210117001808 | 1 | 5.00
|
+------------+--------------------------------+---------------+---------------+
set hoodie.hudi_read_test_4.consume.mode=INCREMENTAL;
set hoodie.hudi_read_test_4.consume.max.commits=3;
set hoodie.hudi_read_test_4.consume.start.timestamp=20211222134547;
set hoodie.hudi_read_test_4.consume.end.timestamp=20211222170043;
0: jdbc:hive2://hudi-read-poc-m-0.c.wmt-bfdms> select col1,col2,col3,col4
from stg_db.hudi_read_test_4 where col1 = '0IN00080048626520210117001808';
+------------+--------------------------------+---------------+---------------+
| col1 | col2 | col3 | col4
|
+------------+--------------------------------+---------------+---------------+
| 80 | 0IN00080048626520210117001808 | 1 | 5.00
|
+------------+--------------------------------+---------------+---------------+
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]