hudi-bot opened a new issue, #16326:
URL: https://github.com/apache/hudi/issues/16326
"Test Call rollback_to_instant Procedure with refreshTable"
Fails if a projection is added to the query plan. The test does not
currently fail, because we don't do the project for non-partitioned tables.
Adding the projection prevents the rdd from being cached.
Query plans:
without projection, caching works:
{code:java}
== Parsed Logical Plan =='Project ['id]+- SubqueryAlias
spark_catalog.default.h0 +- Relation
default.h0[_hoodie_commit_time#547,_hoodie_commit_seqno#548,_hoodie_record_key#549,_hoodie_partition_path#550,_hoodie_file_name#551,id#552,name#553,price#554,ts#555L]
parquet
== Analyzed Logical Plan ==id: intProject [id#552]+- SubqueryAlias
spark_catalog.default.h0 +- Relation
default.h0[_hoodie_commit_time#547,_hoodie_commit_seqno#548,_hoodie_record_key#549,_hoodie_partition_path#550,_hoodie_file_name#551,id#552,name#553,price#554,ts#555L]
parquet
== Optimized Logical Plan ==InMemoryRelation [id#552], StorageLevel(disk,
memory, deserialized, 1 replicas) +- *(1) ColumnarToRow +- FileScan
parquet default.h0[id#552] Batched: true, DataFilters: [], Format: Parquet,
Location: HoodieFileIndex(1
paths)[file:/private/var/folders/d0/l7mfhzl1661byhh3mbyg5fv00000gn/T/spark-87b3...,
PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:int>
== Physical Plan ==InMemoryTableScan [id#552] +- InMemoryRelation
[id#552], StorageLevel(disk, memory, deserialized, 1 replicas) +- *(1)
ColumnarToRow +- FileScan parquet default.h0[id#552] Batched: true,
DataFilters: [], Format: Parquet, Location: HoodieFileIndex(1
paths)[file:/private/var/folders/d0/l7mfhzl1661byhh3mbyg5fv00000gn/T/spark-87b3...,
PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:int> {code}
With projection, no caching:
{code:java}
== Parsed Logical Plan =='Project ['id]+- SubqueryAlias
spark_catalog.default.h0 +- Relation
default.h0[_hoodie_commit_time#539,_hoodie_commit_seqno#540,_hoodie_record_key#541,_hoodie_partition_path#542,_hoodie_file_name#543,id#544,name#545,price#546,ts#547L]
parquet
== Analyzed Logical Plan ==id: intProject [id#544]+- SubqueryAlias
spark_catalog.default.h0 +- Relation
default.h0[_hoodie_commit_time#539,_hoodie_commit_seqno#540,_hoodie_record_key#541,_hoodie_partition_path#542,_hoodie_file_name#543,id#544,name#545,price#546,ts#547L]
parquet
== Optimized Logical Plan ==Project [id#544]+- Relation
default.h0[_hoodie_commit_time#539,_hoodie_commit_seqno#540,_hoodie_record_key#541,_hoodie_partition_path#542,_hoodie_file_name#543,id#544,name#545,price#546,ts#547L]
parquet
== Physical Plan ==*(1) ColumnarToRow+- FileScan parquet default.h0[id#544]
Batched: true, DataFilters: [], Format: Parquet, Location: HoodieFileIndex(1
paths)[file:/private/var/folders/d0/l7mfhzl1661byhh3mbyg5fv00000gn/T/spark-8c60...,
PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:int>
{code}
## JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-7162
- Type: Bug
- Epic: https://issues.apache.org/jira/browse/HUDI-6568
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]