terrytlu created HBASE-29272: -------------------------------- Summary: When Spark reads an HBase snapshot, it always read empty data. Key: HBASE-29272 URL: https://issues.apache.org/jira/browse/HBASE-29272 Project: HBase Issue Type: Bug Reporter: terrytlu Attachments: HbaseSnapshot.java
We found when Spark reads an HBase snapshot, it always read empty data. This is because org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatImpl.InputSplit#getLength will always return 0. As spark will ignore empty splits, which is controlled by spark.hadoopRDD.ignoreEmptySplits, after spark 3.2.0(SPARK-34809) the default vaule is true. So the attachment will always return 0 rows in Spark 3.2.0 even if the hbase snapshot actually has data. -- This message was sent by Atlassian Jira (v8.20.10#820010)